如何从具有div嵌入beautifulsoup4的div中获取第

如何从具有div嵌入beautifulsoup4的div中获取第

本文介绍了如何从具有div嵌入beautifulsoup4的div中获取第一个字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站上提取价格.

I'm trying to extract prices from a website.

我编写的代码可以做到这一点,但是当网站的价格也显示旧价格时,它返回"none"而不是价格字符串.

The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price.

这是不带旧价格的代码示例(我的代码以字符串形式返回)

This is an example of the code without the old price (which my code returns as a string)

<div class="xl-price rangePrice">
                            535.000 €
                        </div>

这是带有旧价格的代码示例(我的代码返回"none")

This is an example of the code WITH the old price (which my code returns as "none")

    < div


class ="xl-price rangePrice" >


487.000 €
< span


class ="old-price" > 497.000 € < br > < / span >

< / div >

我要从中提取代码的页面:页面链接

The page I'm trying to extract code from: pagelink

我的代码:

prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
    prices.append(items.string)

print(prices)

我遇到的另一个问题是它返回的值是这样的:

and another issue I'm having is that it returns the values like this:

'\r\n\t\t\t\t\t\t\t\t298.000 € \r\n\t\t\t\t\t\t\t', '\r\n\t\t\t\t\t\t\t\t145.000 € \r\n\t\t\t\t\t\t\t'

当我只想要数字时.

将感谢您的帮助!

推荐答案

import requests
from bs4 import BeautifulSoup

r = requests.get(
    'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
    item = item.contents[0]
    print(item.strip()[0:-1])

输出:

298.000
145.000
275.000
535.000
487.000
159.000
325.000
189.000
139.000
499.000
520.000
249.500
448.000
215.000
225.000
210.000
215.000
218.000
232.000
689.000
228.000
299.500
169.000
135.000
549.000
125.000
160.000
395.000
430.000
210.000

这篇关于如何从具有div嵌入beautifulsoup4的div中获取第一个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 13:07