我的问题可能有一个更好的标题,但这里是:
我使用带findall的BeautifulSoup将html元素返回到列表中,下面是我得到的示例:

[<div class="tightLt col span-1-3">
    <div class="middle">
        <div class="cell"><i class="sqLed middle sm yellow margRtXs "></i></div>
        <div class="cell"><span class="middle">Neutral Outlook</span></div>
    </div>
</div>,
<div class="tightLt col span-1-3">
    <div class="middle">
        <div class="cell"><i class="sqLed middle sm yellow margRtXs "></i></div>
        <div class="cell"><span class="middle"><span class="showDesk">No opinion of</span> CEO</span>
        </div>
    </div>
</div>]
[<div class="tightLt col span-1-3">
    <div class="middle">
        <div class="cell"><i class="sqLed middle sm red margRtXs "></i></div>
        <div class="cell"><span class="middle">Doesn't Recommend</span></div>
    </div>
</div>,
<div class="tightLt col span-1-3">
    <div class="middle">
        <div class="cell"><i class="sqLed middle sm red margRtXs "></i></div>
        <div class="cell"><span class="middle">Negative Outlook</span></div>
    </div>
</div>,
<div class="tightLt col span-1-3">
    <div class="middle">
        <div class="cell"><i class="sqLed middle sm yellow margRtXs "></i></div>
        <div class="cell"><span class="middle"><span class="showDesk">No opinion of</span> CEO</span>
        </div>
    </div>
</div>]

问题是,在第一个html中,CEO approval(在这两种情况下,CEO approval的对应值是“没有CEO的意见”,但也可以是“不批准CEO”和“批准CEO”)是"span"标记内列表中的第二个元素,但它是第二个html中的第三个元素。因此,我不能使用列表索引从列表中选择元素。我怎样才能解决我的问题?
下面是返回上述列表的部分代码
from bs4 import BeautifulSoup
import requests
url = "https://www.glassdoor.com/Reviews/Walmart-Reviews-E715.htm"
html_content = response = requests.get(url)
soup = BS(html_content, "lxml")
        reviews = soup.find("div", id="EmployerReviews").find_all("li", class_="empReview")
        for review in reviews:
           x = soup.findAll("div", class_="cell reviewBodyCell")
           for z in x:
               z.findAll("div", class_="tightLt col span-1-3")#returns the list that contains needed information

最佳答案

带BeautifulSoupCSS selectors的扩展优化解决方案:

from bs4 import BeautifulSoup
import requests

url = "https://www.glassdoor.com/Reviews/Walmart-Reviews-E715.htm"
html_content = requests.get(url, headers={'user-agent': 'Mozilla/5.0'}).content
soup = BeautifulSoup(html_content, "lxml")

selector = "div#EmployerReviews li.empReview div.cell.reviewBodyCell span[class='showDesk']"
for x in soup.select(selector):
    print(x.parent.text)

输出:
No opinion of CEO
No opinion of CEO
No opinion of CEO
No opinion of CEO
Approves of CEO
No opinion of CEO
Approves of CEO
Approves of CEO
Approves of CEO

10-06 00:33