Scrapy 1.2.2 发布，Web 爬虫框架

Scrapy 1.2.2 发布了。 Scrapy 是一套基于基于Twisted的异步处理框架，纯python实现的爬虫框架，用户只需要定制开发几个模块就可以轻松的实现一个爬虫，用来抓取网页内容以及各种图片。

Scrapy是一个Python开发的一个快速,高层次的屏幕抓取和web抓取框架，用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛，可以用于数据挖掘、监测和自动化测试。

Scrapy 1.2.2 发布了。

更新内容：

Bug 修复

Fix a cryptic traceback when a pipeline fails on open_spider() ({aa22aa})
Fix embedded IPython shell variables (fixing {aa21aa} that re-appeared in 1.2.0, fixed in {aa20aa})
A couple of patches when dealing with robots.txt:
- handle (non-standard) relative sitemap URLs ({aa19aa})
- handle non-ASCII URLs and User-Agents in Python 2 ({aa18aa})

文档

Document "download_latency" key in Request‘s meta dict ({aa17aa})
Remove page on (deprecated & unsupported) Ubuntu packages from ToC ({aa16aa})
A few fixed typos ({aa15aa}, {aa13aa}, {aa14aa}, {aa12aa}) and clarifications ({aa11aa}, {aa10aa}, {aa9aa})

其他变更

Advertize {aa8aa} as Scrapy’s official conda channel ({aa7aa})
More helpful error messages when trying to use .CSS() or .xpath() on non-Text Responses ({aa6aa})
startproject command now generates a sample middlewares.py file ({aa5aa})
Add more dependencies’ version info in scrapy version verbose output ({aa4aa})
Remove all *.pyc files from source distribution ({aa3aa})

下载地址

来源: http://www.phperz.com/article/16/1207/311471.html

暂无,快来抢沙发吧！