Scrapy 1.2.2 发布了。 Scrapy 是一套基于基于Twisted的异步处理框架,纯python实现的爬虫框架,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页内容以及各种图片。
Scrapy是一个Python开发的一个快速,高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。
Scrapy 1.2.2 发布了。
更新内容:
Bug 修复
Fix a cryptic traceback when a pipeline fails on open_spider()
({aa22aa})
Fix embedded IPython shell variables (fixing {aa21aa} that re-appeared in 1.2.0, fixed in {aa20aa})
A couple of patches when dealing with robots.txt:
handle (non-standard) relative sitemap URLs ({aa19aa})
handle non-ASCII URLs and User-Agents in Python 2 ({aa18aa})
文档
Document "download_latency"
key in Request
‘s meta
dict ({aa17aa})
Remove page on (deprecated & unsupported) Ubuntu packages from ToC ({aa16aa})
A few fixed typos ({aa15aa}, {aa13aa}, {aa14aa}, {aa12aa}) and clarifications ({aa11aa}, {aa10aa}, {aa9aa})
其他变更
Advertize {aa8aa} as Scrapy’s official conda channel ({aa7aa})
More helpful error messages when trying to use .CSS()
or .xpath()
on non-Text Responses ({aa6aa})
startproject
command now generates a sample middlewares.py
file ({aa5aa})
Add more dependencies’ version info in scrapy version
verbose output ({aa4aa})
Remove all *.pyc
files from source distribution ({aa3aa})
下载地址
{aa1aa}
{aa0aa}
来源: http://www.phperz.com/article/16/1207/311471.html