问题描述
首先,我将介绍Python的新知识.我最近一直在研究Slack机器人,这是我目前为止的位置.
I'll start by saying I'm sort of new with Python. I've been working on a Slack bot recently and here's where I'm at so far.
source = requests.get(url).content
soup = BeautifulSoup(source, 'html.parser')
price = soup.findAll("a", {"class":"pricing"})["quantity"]
这是我要抓取的HTML代码.
Here is the HTML code I am trying to scrape.
<a class="pricing" saleprice="240.00" quantity="1" added="2017-01-01"> S </a>
<a class="pricing" saleprice="21.00" quantity="5" added="2017-03-14"> M </a>
<a class="pricing" saleprice="139.00" quantity="19" added="2017-06-21"> L </a>
当我仅使用 soup.find()
时,我能够找到第一个数量值,但我需要在列表中将它们全部包含在内.我考虑使用不同的库,例如lxml而不是bs4,但也没有运气.非常感谢您的帮助,因为我已经花了很长时间了.
When I only use soup.find()
, I'm able to find the first quantity value but I need all of them within a list. I looked into using a different library like lxml instead of bs4 but didn't have any luck with that either. Any help is really appreciated as I've already spent a long time on this.
推荐答案
findAll
方法返回bs4 Tag
元素的列表,因此您不能直接选择属性.但是,您可以通过简单的列表理解从可迭代的项目中选择属性.
The findAll
method returns a list of bs4 Tag
elements, so you can't select attributes directly. However you can select attributes from the items in that iterable with a simple list comprehension.
price = [a.get("quantity") for a in soup.findAll("a", {"class":"pricing"})]
请注意,访问属性时最好使用 get
,因为如果中不存在键,则返回
字典. None
(或者您可以设置默认值).> attrs
Note that it's best to use get
when accessing attributes because it returns None
(or you can set a default value) if the key does not exist in the attrs
dictionary.
正如乔恩·克莱门茨(Jon Clements)所指出的,如果您不希望列表中没有 None
个项目,那么在某些项目没有'quantity'属性的情况下,您可以按'class'和'quantity'进行过滤.
As pointed out by Jon Clements you could filter by 'class' and 'quantity' if you don't want your list to have None
items, in case some items have no 'quantity' attribute.
price = [a["quantity"] for a in soup.find_all("a", {"class":"pricing", "quantity":True})]
这篇关于在BS4中使用findAll创建列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!