使用python进行网页抓取并且没有价值时如何防止错误?

本文介绍了使用python进行网页抓取并且没有价值时如何防止错误?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

现在我正在尝试浏览一个房地产网站并抓取有关房产的数据.我有一个代码，通过属性列表获取数据，然后转到每个属性的页面并获取更详细的数据.它有效，但问题是如果缺少任何字段，我会收到一个错误，导致异常并使其跳到下一个属性.相反，我希望它只是为任何丢失的数据设置一个空值我只是想让它把空值放在它发现丢失数据的地方.这是代码，其中 prop_list 是

Right now I'm trying to go through a real estate website and scrape data on properties. I've got a code that goes through the list of properties gets data, and then goes to the page for each property and gets more detailed data. It works but the problem is that if any field is missing I get an error that causes an exception and makes it skip to the next property. Instead I'd like to have it just put a null for any missing data I'm new to Python and webscraping so there might be more insights on how to clean my code up so feel free to comment on that as well but mostly I'm just trying to get it to put nulls where it finds missing data. Here's the code where prop_list is the html code for

for item in prop_list:
    try:
        d ={}
        d["address"] = item.find("span", {"itemprop":"streetAddress"}).text
        d["city"] = item.find("span", {"itemprop":"addressLocality"}).text
        d["state"] = item.find("span", {"itemprop":"addressRegion"}).text
        d["zip_code"] = item.find("span", {"itemprop":"postalCode"}).text
        d["price"] = item.find("span", {"class":"data-price"}).text
        d["lot_sqft"] = item.find("li", {"data-label":"property-meta-lotsize"}).find("span", {"class":"data-value"}).text
        link = item.find("a").get("href")
        url = "https://www.realtor.com" + link
        d["url"] = url
        d["longitude"] = item.find("meta",{"itemprop":"longitude"}).get("content")
        d["latitude"] = item.find("meta",{"itemprop":"latitude"}).get("content")
        desc_link = requests.get(url,headers=headers)
        b = desc_link.content
        temp = BeautifulSoup(b,"html.parser")
        d["description"] = temp.find("p", {"class": "word-wrap-break"})
        d["year_built"] = temp.find("li", {"data-label": "property-year"}).find("div", {"class":"key-fact-data ellipsis"}).text

        l.append(d)

    except:
        print("exception occurred")

谢谢！

推荐答案

由于您是初学者，我会以这种方式详细说明您的代码.只需使用这样的 if-else 语句:

Since you're a beginner, I'd elaborate your code in this way. Just use a if-else statement like this:

if item.find("span", {"itemprop" : "streetAddress"}):
    d["address"] = item.find("span", {"itemprop":"streetAddress"}).text
else:
    d["address"] = "" # or None

现在对每个元素都这样做会很忙，所以以 Pythonic 的方式:

Now doing like this for each element would be hectic, so in the Pythonic way:

d["address"] = item.find("span", {"itemprop":"streetAddress"}).text if item.find("span", {"itemprop":"streetAddress"}) else ""

这将完全满足您的需求.

This would get you exactly what you need.

这篇关于使用python进行网页抓取并且没有价值时如何防止错误?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！