学习伟大的 Python 的第八天

html_doc = """<html><head><title>The Dormouse's story</title></head><body><p class="sister"><b>$37</b></p><p class="story" id="p">Once upon a time there were three little sisters; and their names were<b>tank</b><a href="http://example.com/elsie" class="sister">Elsie</a>,<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.<hr></hr></p><p class="story">...</p>"""'''搜索文档树:
    find()  找一个
    find_all()  找多个
标签查找与属性查找:
    标签:
            name 属性匹配
            attrs 属性查找匹配
            text 文本匹配
        - 字符串过滤器
            字符串全局匹配
        - 正则过滤器
            re 模块匹配
        - 列表过滤器
            列表内的数据匹配
        - bool 过滤器
            True 匹配
        - 方法过滤器
            用于一些要的属性以及不需要的属性查找.
    属性:
        - class_
        - id
'''
from bs4 importBeautifulSoup
soup = BeautifulSoup(html_doc, 'lxml')
# 字符串过滤器 #name
p_tag = soup.find(name='p')
print(p_tag)  #根据文本 p 查找某个标签 #找到所有标签名为 p 的节点
tag_s1 = soup.find_all(name='p')
print(tag_s1)
#attrs# 查找第一个 class 为 sister 的节点
p = soup.find(attrs={"class": "sister"})
print(p)
# 查找所有 class 为 sister 的节点
tag_s2 = soup.find_all(attrs={"class": "sister"})
print(tag_s2)
#text
text = soup.find(text="$37")
print(text)
# 配合使用:# 找到一个 id 为 link2, 文本为 Lacie 的 a 标签
a_tag = soup.find(name="a", attrs={"id": "link2"}, text="Lacie")
print(a_tag)
## 正则过滤器 #import re## name#p_tag = soup.find(name=re.compile('p'))#print(p_tag)
# 列表过滤器 #import re## name#tags = soup.find_all(name=['p', 'a', re.compile('html')])#print(tags)
#- bool 过滤器 #True 匹配# 找到有 id 的 p 标签 #p = soup.find(name='p', attrs={"id": True})#print(p)
# 方法过滤器 #匹配标签名为 a, 属性有 id 没有 class 的标签 #def have_id_class(tag):#if tag.name == 'a' and tag.has_attr('id') and tag.has_attr('class'):#return tag#
#tag = soup.find(name=have_id_class)#print(tag)
来源: http://www.bubuko.com/infodetail-3098837.html
与本文相关文章

暂无,快来抢沙发吧！