python 爬虫之 urllib 模块和 requests 模块:这两个模块最大的区别就是在爬取数据的时候连接方式的不同。urllb 爬取完数据是直接断开连接的,而 requests 爬取数据之后可以继续复用 socket,并没有断开连接。
两种用法的区别,上源码:
requests:
- #coding: utf - 8import requestsdef eazy_url_demo(url) : res = requests.get(url) print '>>>>>>>Res info>>'print res.headers print 'read>>>>>>'print res.textdef url_get(url) : data = {
- 'param1': 'hello',
- 'param2': 'wrold'
- }
- res = requests.get(url, params = data) print '>>>>>>>code'print res.status_code print res.reason print '>>>>>>>Res info>>'print res.headers print 'read>>>>>>'print res.textif __name__ == '__main__': #url_exp = 'http://httpbin.org/ip'#eazy_url_demo(url_exp) url_get1 = 'http://httpbin.org/get'url_get(url_get1)
- urllib和urllib2:
- #coding: utf - 8import urllib2,
- urllibdef eazy_url_demo(url) : res = urllib2.urlopen(url) print '>>>>>>>Res info>>'print res.info() print 'read>>>>>>'print res.read() def url_get(url) : data = urllib.urlencode({
- 'param1': 'hello',
- 'param2': 'wrold'
- }) print type(url) print type(data) new_url = '?'.join([url, '%s']) % data res = urllib2.urlopen(new_url) print '>>>>>>>Res info>>'print res.info() print 'read>>>>>>'print res.read() if __name__ == '__main__': #url_exp = 'http://httpbin.org/ip'#eazy_url_demo(url_exp) url_get1 = 'http://httpbin.org/get'url_get(url_get1)
- 总结:requests还是比urllib更简单明了的,目前还没有发现节约资源方面的具体使用,继续跟进中。
就爱阅读 www.92to.com 网友整理上传, 为您提供最全的知识大全, 期待您的分享,转载请注明出处。
来源: http://www.92to.com/bangong/2017/03-17/18928301.html