我正在用漂亮的汤刮一些数据
foodily.com

在上面的页面中,有一个div类别为'ings'的代码,我想在其p标记内获取数据,因为我在下面的代码中编写了该数据:

ingredients = soup.find('div', {"class": "ings"}).findChildren('p')


它为我提供成分列表,但带有p标签。

最佳答案

get_text()为在p元素内找到的每个div元素调用class="ings"

完整的工作代码:

from bs4 import BeautifulSoup
import requests

with requests.Session() as session:
    session.headers.update({"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})
    response = session.get("http://www.foodily.com/r/0y1ygzt3zf-perfect-vanilla-cupcakes-by-annie-s")

    soup = BeautifulSoup(response.content, "html.parser")

    ingredients = [ingredient.get_text() for ingredient in soup.select('div.ings p')]
    print(ingredients)


印刷品:

[
    u'For the cupcakes:',
    u'1 stick (113g) butter/marg*',
    u'1 cup caster sugar', u'2 eggs',
    ...
    u'1 tbsp vanilla extract',
    u'2-3tbsp milk',
    u'Sprinkles to decorate, optional'
]


请注意,我还对定位器做了一些改进,并切换到div.ings p CSS selector

10-06 10:32