因为我的云数据库访问太慢
所以没有用数据库, 直接响应 web 过来的请求
有一些接口还没有写完..
一开始只是想抓下微博, 后来改主意了
结果到现在还没写完..
预览网页
https://douban.qing.workers.dev/
效果
- Firefox_Screenshot_2019-08-13T06-20-56.288Z.PNG
- Firefox_Screenshot_2019-08-13T06-21-33.926Z.PNG
- Firefox_Screenshot_2019-08-13T06-24-47.361Z.PNG
- Firefox_Screenshot_2019-08-13T06-25-48.794Z.PNG
这个一个完整的 python 爬虫
核心代码为
路由
http 请求
各种 parser(依赖 bs4)
因为自带 router, 所以需要把触发方式改成
启用集成响应 已启用
API 网关收到的 http 请求大概是这样的
- def main_handler(event,content={}):
- event={
- "body": "{\"x\":1,\"y\":2}",
- "headerParameters": {},
- "headers": {
- "accept": "*/*",
- "content-length": "7",
- "content-type": "application/x-www-form-urlencoded",
- "endpoint-timeout": "15",
- "host": "service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com",
- "user-agent": "curl/7.61.1",
- "x-anonymous-consumer": "true",
- "x-qualifier": "$LATEST"
- },
- "httpMethod": "POST",
- "path": "/weibo/ccc",
- "pathParameters": {},
- "queryString": {},
- "queryStringParameters": {},
- "requestContext": {
- "httpMethod": "ANY",
- "identity": {},
- "path": "/weibo",
- "serviceId": "service-75ph8ybo",
- "sourceIp": "58.60.1.25",
- "stage": "release"
- }
- }
取出这些值
参数 | 值 | 备注 |
---|---|---|
body | "{\"x\":1,\"y\":2}" | 如果是 json 需要解 json.load |
headers | {....} | http 头 |
pathParameters | {} | |
queryString | {} | ccc?x=1&y=2 -> {x:"1",y:"2"} |
httpMethod | "POST" | |
path | "/weibo/ccc" | |
requestContext'path' | "/weibo" |
构建一个 router 的函数表.
请求来了从函数表中取出对应函数响应即可
API 网关集成响应
- def main_handler(event,content={}):
- r={
- "isBase64Encoded": False, # 如果 body 是图片为 true
- "statusCode": 200,
- "headers": {
- "Content-Type": "application/json; charset=utf-8",
- "access-control-allow-origin": "*",
- "access-control-allow-methods": "GET,POST,PUT,PATCH,TRACE,DELETE,HEAD,OPTIONS",
- "access-control-allow-headers": "accept,accept-encoding,cf-connecting-ip,cf-ipcountry,cf-ray,cf-visitor,connection,content-length,content-type,host,user-agent,x-forwarded-proto,x-real-ip,accept-charset,accept-language,accept-datetime,authorization,cache-control,date,if-match,if-modified-since,if-none-match,if-range,if-unmodified-since,max-forwards,pragma,range,te,upgrade,upgrade-insecure-requests,x-requested-with,chrome-proxy,purpose,accept,accept-language,content-language,content-type,dpr,downlink,save-data,viewport-width,width",
- "access-control-max-age": "1728000"
- },
- "body": "123"
- }
- return r
- # 路由
- def test_main():
- App=App()
- @App.route(path="/ccc",methods=["GET","POST"])
- @json_dec
- def main_handler(event,content={}):
- body=event["body"]
- headerParameters=event["headerParameters"]
- queryString=event["queryString"]
- queryStringParameters=event['queryStringParameters']
- sourceIp=event['requestContext']["sourceIp"]
- print(body)
- return event
- # logger.info('start main_handler')
- # logging.basicConfig(level=logging.INFO, stream=sys.stdout)
- # logger = logging.getLogger()
- # logger.setLevel(level=logging.INFO)
- # logger.info('Loading function')
- d={'body': '{"x":1,"y":2}', 'headerParameters': {}, 'headers': {'accept': '*/*', 'content-length': '7', 'content-type': 'application/x-www-form-urlencoded', 'endpoint-timeout': '15', 'host': 'service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com', 'user-agent': 'curl/7.61.1', 'x-anonymous-consumer': 'true', 'x-qualifier': '$LATEST'}, 'httpMethod': 'POST', 'path': '/weibo/ccc', 'pathParameters': {}, 'queryString': {}, 'queryStringParameters': {}, 'requestContext': {'httpMethod': 'ANY', 'identity': {}, 'path': '/weibo', 'serviceId': 'service-75ph8ybo', 'sourceIp': '1.1.1.1', 'stage': 'release'}}
- z=App.run(d,{})
- print(z)
- return z
安装包放在 vendor 文件夹
- md vendor
- pip3 install bs4 -t ./vendor
然后把 '/var/user/vendor'加到环境变量就可以随意引用了
- import sys
- u='/var/user/vendor'
- sys.path.append(u)
手动部署
- pip3 install scf
- name=weibo
- scf deploy -n $name #第一次部署
- scf deploy -n $name --skip-event -f #非第一次
自动部署
- # 安装
- ./setup i
- #部署到腾讯云
- ./setup
站点
[Hacker News][1][sputniknews][2][reuters][3][环球网][4][微博热搜][5]
- [1]: http://hackernews.betacat.io/
- [2]: http://sputniknews.cn/
- [3]: http://cn.reuters.com/
- [4]: http://www.huanqiu.com/
- [5]: https://www.enlightent.cn/research/rank/weiboSearchRank
文件路径
[依赖列表](./depend.txt)
[安装包](./vendor)
[入口](./index.py)
[爬虫](./spider.py)
[解析](./parser)
接口
- [base](https://service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com/release/weibo/)
- + [/hack_news](https://service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com/release/weibo/hack_news)
- + [/sputni](https://service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com/release/weibo/sputni)
- + [/vbc?n=1](https://service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com/release/weibo/vbc?n=2)
- + [/huanqiu?n=1](https://service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com/release/weibo/huanqiu?n=1)
- + [/news](https://service-75ph8ybo-1252957949.ap-hongkong.apigateway.myqcloud.com/release/weibo/news)
- https://github.com/birdsofsummer/news_spider
来源: https://www.qcloud.com/developer/article/1486101