给定一个html代码,可以说:

 <div class="class1">
    <span class="class2">some text</span>
    <span class="class3">some text</span>
    <span class="class4">some text</span>
    </div>

如何检索所有类名?即:['class1','class2','class3','class4']
我试过:
soup.find_all(class_=True)

但它会检索整个标记,然后我需要对字符串执行一些正则表达式

最佳答案

在检索属性时可以treat each Tag instance found as a dictionary。注意class属性值将是一个列表,因为class是一个特殊的"multi-valued" attribute

classes = []
for element in soup.find_all(class_=True):
    classes.extend(element["class"])

或:
classes = [value
           for element in soup.find_all(class_=True)
           for value in element["class"]]

演示:
In [1]: from bs4 import BeautifulSoup

In [2]: data = """
   ...: <div class="class1">
   ...:     <span class="class2">some text</span>
   ...:     <span class="class3">some text</span>
   ...:     <span class="class4">some text</span>
   ...: </div>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: classes = [value
   ...:            for element in soup.find_all(class_=True)
   ...:            for value in element["class"]]

In [5]: print(classes)
['class1', 'class2', 'class3', 'class4']

08-06 03:05