python 爬虫编写英译中小程序

1. 选择一个翻译页面, 我选择的是有道词典 (http://dict.youdao.com)

2. 随便输入一个英语单词进行翻译, 然后查看源文件, 找到翻译后的内容所在的位置, 看它在什么标签里

3. 开始编写程序

(1) 首先引入 requests 库跟 BeautifulSoup 库

(2) 更改请求头, 防止被页面发现是爬虫, 可以在审查元素里找

(3) 确定 URL, 在有道是 http://dict.youdao.com/w/%s/#keyfrom=dict2.top

(4) 开始写简单的程序, 主要内容就三行

第一步: r = requests.get(url=' ',headers=)

用 requests 向页面发出请求, 事先写好相应的请求头和 URL

第二步: soup = BeautifulSoup(r.text,"lxml")

用 BeautifulSoup 把获得的 text 文件, 转化为 html 格式

第三步: s = soup.find(class_='trans-container')('ul')[0]('li')

.find() 方法用于寻找匹配的信息, class,ul,li 是所在的标签, 这一步根据不同的内容有所不同,

根据源文件相应改变,[0] 文本所在位置

(5) 进行优化, 加入 try...except...finally

4. 完整程序

import requests
 from bs4 import BeautifulSoup
 Word = input("Enter a word (enter'q'to exit):")
 header ={'User-agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) ApplewebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Mobile Safari/537.36'}
 while Word !='q':
     try:
         r = requests.get(url='http://dict.youdao.com/w/%s/#keyfrom=dict2.top'%Word,headers=header)
         soup = BeautifulSoup(r.text,"lxml")
         s = soup.find(class_='trans-container')('ul')[0]('li')
         for item in s:
             if item.text:
                 print(item.text)
         print('='*40+'\n')
     except Exception:
         print('Sorry,there is a error!\n')
     finally:
         Word =input("Enter a word (enter'q'to exit):")

来源: http://www.bubuko.com/infodetail-2991404.html

与本文相关文章

暂无,快来抢沙发吧！