python通过HTMLParser抓取网页上的全部链接

 
import htmlParser, urllib
 
class linkParser(HTMLParser.HTMLParser):
    def __init__(self):
        HTMLParser.HTMLParser.__init__(self)
        self.links = []
    def handle_starttag(self, tag, attrs):
        if tag=='a':
            self.links.append(dict(attrs)['href'])
 
htmlSource = urllib.urlopen("<a href="http://www.codeSnippet.cn">http://www.codeSnippet.cn").read(200000)
p = linkParser()
p.feed(htmlSource)
for link in p.links:
    print link
#该片段来自于http://www.codesnippet.cn/detail/100120131474.html

来源: http://www.codesnippet.cn/detail/100120131474.html

与本文相关文章

Python抓取网页中的图片
python通过mechanize模块实现不断刷新网页的功能
Python selenium 自动化网页抓取器
python抓取网页数据 Python数据抓取
python调用ie抓取网页里的图片并保存
学习 Python selenium 自动化网页抓取器
python通过正则表达式分析网页中的图片并进行替换
Python 抓取网页 gb2312 乱码问题

暂无,快来抢沙发吧！