问题描述
我想选择所有<div>
,其中类名是post has-profile bg2
或post has-profile bg1
但不是最后一个,即panel
I want to select all <div>
where class name is either post has-profile bg2
OR post has-profile bg1
but not last one i.e. panel
<div id="6" class="post has-profile bg2"> some text 1 </div>
<div id="7" class="post has-profile bg1"> some text 2 </div>
<div id="8" class="post has-profile bg2"> some text 3 </div>
<div id="9" class="post has-profile bg1"> some text 4 </div>
<div class="panel bg1" id="abc"> ... </div>
select()
仅匹配单个匹配项.我正在尝试使用find_all()
,但是bs4无法找到它.
select()
is matching only single occurrence. I'm trying it with find_all()
, but bs4 is not able to find it.
if soup.find(class_ = re.compile(r"post has-profile [bg1|bg2]")):
posts = soup.find_all(class_ = re.compile(r"post has-profile [bg1|bg2]"))
如何使用正则表达式和不使用正则表达式来解决?谢谢.
How to solve it with regex and without regex? Thanks.
推荐答案
您可以在BeautifulSoup中使用内置的CSS选择器:
You can use builtin CSS selector within BeautifulSoup:
data = """<div id="6" class="post has-profile bg2"> some text 1 </div>
<div id="7" class="post has-profile bg1"> some text 2 </div>
<div id="8" class="post has-profile bg2"> some text 3 </div>
<div id="9" class="post has-profile bg1"> some text 4 </div>
<div class="panel bg1" id="abc"> ... </div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
divs = soup.select('div.post.has-profile.bg2, div.post.has-profile.bg1')
for div in divs:
print(div)
print('-' * 80)
打印:
<div class="post has-profile bg2" id="6"> some text 1 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg2" id="8"> some text 3 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg1" id="7"> some text 2 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg1" id="9"> some text 4 </div>
--------------------------------------------------------------------------------
'div.post.has-profile.bg2, div.post.has-profile.bg1'
选择器将选择所有类别为"post hast-profile bg2"
的<div>
标签以及所有类别为"post hast-profile bg1"
的<div>
标签.
The 'div.post.has-profile.bg2, div.post.has-profile.bg1'
selector selects all <div>
tags with class "post hast-profile bg2"
and all <div>
tags with class "post hast-profile bg1"
.
这篇关于BeautifulSoup-具有不同类名的find_all div标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!