Requests 是 Python 基于 Apache2 Licensed 许可证的人性化 HTTP 库.
Python 标准库中 urllib2 提供了不少 HTTP 功能, 但 API 不系统. 它有点过时, 完成最简单的任务也需要大量工作.
下面我们用实例演示访问 GitHub.
- >>> import requests
- >>> r = requests.get('https://api.github.com/user', auth=('ouyangchongwu@test.com', 'password'))
- >>> r.status_code
- 200
- >>> r.headers['content-type']
- 'application/json; charset=utf-8'
- >>> r.encoding
- 'utf-8'
- >>> r.text
- u'{"login":"oychw",...}'
- >>> r.JSON()
- {
- u'disk_usage': 176, u'private_gists': 0, ...
- }
Requests 为 Python 处理了所有 HTTP/1.1 操作, 与 Web 服务的无缝集成. 不需要为 URL 手动添加查询字符串或 POST 数据进行表单处理. 基于 https://github.com/shazow/urllib3 , 能自动处理 Keep-alive 和 HTTP 连接池.
特点:
国际化域名和 URLs
Keep-Alive & 连接池
持久的 Cookie 会话
类浏览器的 SSL 认证
基本 / 摘要式的身份认证
优雅的键 / 值 Cookie
自动解压
Unicode 响应体
多段文件上传
连接超时
支持 .netrc
适用于 Python 2.6-3.4
线程安全
用户手册
简介
Requests 关注 PEP 20 的部分:
- Beautiful is better than ugly.(美丽优于丑陋)
- Explicit is better than implicit.(明确优于含糊)
- Simple is better than complex.(简单优于复杂)
- Complex is better than complicated.(复杂优于繁琐)
- Readability counts.(可读性)
安装
安装: 推荐: pip install requests, 其次: easy_install requests
最新代码:
Git clone Git://GitHub.com/kennethreitz/requests.Git
下载 tar.gz 包: curl -OL
下载 zip 包: curl -OL
源码安装: python setup.py install
快速入门
发送请求:
下面获取 GitHub 的公共时间线, 并在 httpbin 演示其他 HTTP 操作:
- >>> import requests
- >>> r = requests.get('https://github.com/timeline.json')
- >>> r = requests.post("http://httpbin.org/post")
- >>> r = requests.put("http://httpbin.org/put")
- >>> r = requests.delete("http://httpbin.org/delete")
- >>> r = requests.head("http://httpbin.org/get")
- >>> r = requests.options("http://httpbin.org/get")
在 URL 中传递参数
URL 的查询字符串 (query string) 例如, httpbin.org/get?key=val, 在 Requests 可以用字典的形式构建. 比如传递 key1=value1 和 key2=value2 到 httpbin.org/get:
- >>> import requests
- >>> payload = {
- 'key1': 'value1', 'key2': 'value2'
- }
- >>> r = requests.get("http://httpbin.org/get", params=payload)
- >>> print(r.url)
- http://httpbin.org/get?key2=value2&key1=value1
- >>> payload = {
- 'key1': 'value1', 'key2[]': ['value2', 'value3']
- }
- >>> r = requests.get("http://httpbin.org/get", params=payload)
- >>> print(r.url)
- http://httpbin.org/get?key1=value1&key2[]=value2&key2[]=value3
注意字典里值为 None 的键会忽略. 上面第2个例子访问的是 http://httpbin.org/get?key1=value1&key2[]=value2&key2[]=value3 . 注意 key 后面需要添加中括号对.
响应
- >>> import requests
- >>> r = requests.get('https://api.github.com/events')
- >>> r.text
- >>> u'[{"id":"2636319727","type":"PullRequestReviewCommentEvent","actor":{"id":1148601,"login":"i ...}]
- >>> r.encoding
- 'utf-8'
- >>> r.encoding = 'ISO-8859-1'
Requests 会自动解码服务器的返回. 大多数 unicode 字符集都能无缝解码. 请求发出时 Requests 会基于响应的 HTTP 头部推测响应的编码. 同时还可以设置和查询编码. 改变编码后, 访问 r.text 将会使用 r.encoding .
二进制响应
r.content 可以以字节的方式显示响应.
- >>> r.content
- b'[{
- "repository":{"open_issues":0,"url":"https://github.com/...
传输格式 gzip 和 deflate 会自动转码. 处理图片实例:
- >>> from PIL import Image
- >>> from StringIO import StringIO
- >>> i = Image.open(StringIO(r.content))
JSON 响应
Requests 内置了 JSON 解码器:
- >>> import requests
- >>> r = requests.get('https://github.com/timeline.json')
- >>> r.JSON()
- {
- u'documentation_url': u'https://developer.github.com/v3/activity/events/#list-public-events', u'message': u"Hello there, wayfaring stranger. If you're reading this then you probably didn't see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead."
- }
JSON 解码失败时 r.JSON 就会抛出异常. 例如, 401 (Unauthorized) , ValueError: No JSON object could be decoded 等.
原始响应
极端的情况下需要查看服务器的原始套接字响应, 请求时设置 stream=True:
- >>> import requests
- >>> r = requests.get('https://api.github.com/events', stream=True)
- >>> r.raw
- <urllib3.response.HTTPResponse object at 0x7f807dd6f4d0>
- >>> r.raw.read(10)
- '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
通常需要存为文件:
- with open(filename, 'wb') as fd:
- for chunk in r.iter_content(chunk_size):
- fd.write(chunk)
Response.iter_content 能减少直接使用 Response.raw 的大量处理, 下载流时尤其推荐.
自定义头
- >>> import requests
- >>> import JSON
- >>> url = 'https://api.github.com/some/endpoint'
- >>> payload = {
- 'some': 'data'
- }
- >>> headers = {
- 'content-type': 'application/json'
- }
- >>> r = requests.post(url, data=JSON.dumps(payload), headers=headers)
更加复杂的 POST 请求
表单直接以字典形式发送:
- >>> import requests
- >>> payload = {'key1': 'value1', 'key2': 'value2'}
- >>> r = requests.post("http://httpbin.org/post", data=payload)
- >>> print(r.text)
- {
- "args": {},
- "data": "",
- "files": {},
- "form": {
- "key1": "value1",
- "key2": "value2"
- },
- "headers": {
- "Accept": "*/*",
- "Accept-Encoding": "gzip, deflate, compress",
- "Content-Length": "23",
- "Content-Type": "application/x-www-form-urlencoded",
- "Host": "httpbin.org",
- "User-Agent": "python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-53-generic"
- },
- "json": null,
- "origin": "119.122.150.177",
- "url": "http://httpbin.org/post"
- }
string 则会被直接发布出去. GitHub API v3 中接受编码为 JSON 的 POST/PATCH 数据
- >>> import requests
- >>> import JSON
- >>> url = 'https://api.github.com/some/endpoint'
- >>> payload = {
- 'some': 'data'
- }
- >>> r = requests.post(url, data=JSON.dumps(payload))
POST 复杂编码的文件
- >>> import requests
- >>> url = 'http://httpbin.org/post'
- >>> files = {
- 'file': open('/home/andrew/test.xls', 'rb')
- }
- >>> r = requests.post(url, files=files)
- >>> url = 'http://httpbin.org/post'
- >>> r.text
- u'{\n"args": {}, \n"data":"", \n "files": {\n ... "url": "http://httpbin.org/post"\n}\n'
可以显式地设置文件名, 文件类型和请求头:
- >>> import requests
- >>> files = {
- 'file': ('report.xls', open('/home/andrew/test.xls', 'rb'), 'application/vnd.ms-excel', {
- 'Expires': '0'
- })
- }
- >>> url = 'http://httpbin.org/post'
- >>> r = requests.post(url, files=files)
- >>> r.text
- u'{\n"args": {}, \n"data":""..."url": "http://httpbin.org/post"\n}\n'
还可以直接用文字代替文件:
- >>> import requests
- >>> url = 'http://httpbin.org/post'
- >>> files = {
- 'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')
- }
- >>> r = requests.post(url, files=files)
- >>> r.text
- u'{\n"args": {}, \n"data":"", \n "files": {\n ... "json": null, \n "origin": "14.153.22.104", \n "url": "http://httpbin.org/post"\n}\n'
multipart/form-data 不支持特别大的文件, 建议使用 requests-toolbelt, 参考: https://toolbelt.readthedocs.org/en/latest/
响应状态码
- >>> import requests
- >>> r = requests.get('http://httpbin.org/get')
- >>> r.status_code
- 200
- >>> r.status_code == requests.codes.ok
- True
- >>> bad_r = requests.get('http://httpbin.org/status/404')
- >>> bad_r.status_code
- 404
- >>> bad_r.raise_for_status()
- >>> r.raise_for_status()
上面的 requests.codes.ok 是内置的状态码查询对象. 可以使用 Response.raise_for_status()跑出失败请求(4XX 客户端错误或 5XX 服务器异常), 我们可以通过 Response.raise_for_status() 来抛出异常. r 的返回为 200, 所以返回 None, 不产生异常.
响应头
- >>> r.headers
- {'content-length': '275', 'server': 'nginx', 'connection': 'keep-alive', 'access-control-allow-credentials': 'true', 'date': 'Tue, 10 Mar 2015 08:21:36 GMT', 'access-control-allow-origin': '*', 'content-type': 'application/json'}
- >>> r.headers['Content-Type']
- 'application/json'
- >>> r.headers.get('content-type')
- 'application/json'
根据 RFC 2616,HTTP 头部不区分大小写. 根据 RFC 7230, 接收方会对服务端对同一 key 的不同 value 进行组合.
Cookies
可以访问响应中包含的 Cookie:
- >>> import requests
- >>> url = 'http://automationtesting.sinaapp.com/login'
- >>> r = requests.get(url)
- >>> r.cookies.keys()
- ['saeut', 'trac_form_token', 'trac_session']
- >>> r.cookies['saeut']
- 'CkMPGlT+tfQiXS9uGYviAg=='
使用 cookies 参数可以发送你的 cookies 到服务器:
- >>> import requests
- >>> url = 'http://httpbin.org/cookies'
- >>> cookies = dict(cookies_are='working')
- >>> r = requests.get(url, cookies=cookies)
- >>> r.text
- u'{\n"cookies": {\n"cookies_are":"working"\n }\n}\n'
重定向与请求历史
默认对 HEAD 以外其他所有动作进行位置重定向. Response.history 可以看到重定向的记录.
- >>> import requests
- >>> r = requests.get('http://github.com')
- >>> r.url
- u'https://github.com/'
- >>> r.status_code
- 200
- >>> r.history
- [<Response [301]>]
GET, OPTIONS, POST, PUT, PATCH 或者 DELETE 可以通过 allow_redirects 参数禁用重定向, 这个设置对 HEAD 也生效:
- >>> import requests
- >>> r = requests.get('http://github.com', allow_redirects=False)
- >>> r.status_code
- 301
- >>> r.history
- []
- >>> r = requests.head('http://github.com', allow_redirects=True)
- >>> r.url
- u'https://github.com/'
- >>> r.history
- [<Response [301]>]
超时
超时告诉 requests 在经过 timeout 参数的秒之后停止等待响应:
- >>> import requests
- >>> requests.get('http://github.com', timeout=0.1)
- Traceback (most recent call last):
- File "<stdin>", line 1, in <module>
- File "/usr/lib/python2.7/dist-packages/requests/api.py", line 55, in get
- return request('get', url, **kwargs)
- File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
- return session.request(method=method, url=url, **kwargs)
- File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
- resp = self.send(prep, **send_kwargs)
- File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
- r = adapter.send(request, **kwargs)
- File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 387, in send
- raise Timeout(e)
- requests.exceptions.Timeout: (<urllib3.connectionpool.HTTPConnectionPool object at 0x7f807dd6f050>, 'Connection to github.com timed out. (connect timeout=0.1)')
- >>>
注: 超时不是对整个响应下载的时间限制, 而且指定时间没有收到服务器返回就抛出异常.
错误与异常
ConnectionError: 网络问题(如 DNS 失败, 拒绝连接等).
HTTPError: 比较罕见的无效 HTTP 响应时.
Timeout: 请求超时.
TooManyRedirects: 超过了设定的最大重定向次数.
requests.exceptions.RequestException 是所有具体异常的基类.
高级用法
Session 对象
Session 对象能够跨请求保持参数, Session 实例发出的所有请求共享 cookies.
Session 对象具有主 Requests API 的所有方法.
- >>> s = requests.Session()
- >>> s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
- <Response [200]>
- >>> r = s.get("http://httpbin.org/cookies")
- >>> print(r.text)
- {
- "cookies": {
- "sessioncookie": "123456789"
- }
- }
Session 也可为 request 方法提供缺省数据, 添加属性即可:
- >>> import requests
- >>> s = requests.Session()
- >>> s.auth = ('user', 'pass')
- >>> s.headers.update({
- 'x-test': 'true'
- })
- >>> s.get('http://httpbin.org/headers', headers={
- 'x-test2': 'true'
- })
- <Response [200]>
传递给 request 方法的字典都会与已有 session 层的值合并. 方法层的参数会覆盖会话的参数. 在方法层参数中将键值设置为 None, 会被自动忽略 key. 参考: session API
请求 (Request) 和响应 (Response) 对象
requests.get()等请求主要做两件的事情. 一为构建 Request 对象. 二为收到服务器响应时产生 Response 对象. Response 对象包含服务器返回和原来的 Request 对象.
- >>> import requests
- >>> r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')
- >>> r.headers
- {
- 'content-length': '67559', ...
- }
- >>> r.request.headers
- {
- 'Connection': 'keep-alive', 'Accept-Encoding': ...
- }
预请求
当从 API 或会话调用接收 Response 对象时, request 属性实际上是 PreparedRequest. 如果你需要修改 body 或 header, 可以如下方式进行处理:
- from requests import Request, Session
- s = Session()
- req = Request('GET', url,
- data=data,
- headers=header
- )
- prepped = req.prepare()
- # do something with prepped.body
- # do something with prepped.headers
- resp = s.send(prepped,
- stream=stream,
- verify=verify,
- proxies=proxies,
- cert=cert,
- timeout=timeout
- )
- print(resp.status_code)
这里没有对 Request 对象进行特殊处理, 而是修改 PreparedRequest 对象. 然后用 requests.* 或 Session.*. 发送.
上述代码没有 Request Session.Session 层状态, 如 cookie 不会使用. 用 Session.prepare_request()替换 Request.prepare()即可增加状态支持:
- from requests import Request, Session
- s = Session()
- req = Request('GET', url,
- data=data
- headers=headers
- )
- prepped = s.prepare_request(req)
- # do something with prepped.body
- # do something with prepped.headers
- resp = s.send(prepped,
- stream=stream,
- verify=verify,
- proxies=proxies,
- cert=cert,
- timeout=timeout
- )
- print(resp.status_code)
SSL 证书验证
使用 verify 参数可以像 Web 浏览器一样为 HTTPS 请求验证 SSL 证书:
参考资料
requests 英文文档 http://docs.python-requests.org/en/latest/
- requests-docs-cn.readthedocs.org
- docs.python-requests.org/en/latest/
来源: http://www.jianshu.com/p/82dc2ed8e1ba