本文介绍了python scrapy在本地主机上正常工作吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个scrapy蜘蛛来抓取一些html标签.现在的问题是这个蜘蛛非常适合在互联网上运行的网址,但不适用于本地主机上的网址.我的意思是,即使 url 完全正确并且当 url 为正在运行的站点时,蜘蛛也会为本地计算机上的资源 url 生成错误,并且可以为相同的资源正常工作.有人能解开我的这个疑惑吗?

I have written a scrapy spider to scrape out some html tags. Now the problem is that this spider works perfectly for a url that is running on internet but not for a url that is on localhost. What i mean is, the spider produces error for a url of the resource on local computer even when the url is perfectly correct and works correctly for the same resource when url for the running site.Can someone clear this doubt of mine?

    def parse(self, response):
    hxs = HtmlXPathSelector(response)
    con = MySQLdb.connect(host="localhost",
                          user = "username",
                          passwd="psswd",
                          db ="dbname")
    cur = con.cursor()
    title = hxs.select("//h3")[0].extract()
    desc = hxs.select("//h2").extract()
    a = hxs.select("//meta").extract()
    cur.execute("""Insert into heads(h2) Values(%s )""",(a))
    con.commit()
    con.close()

推荐答案

错误

exceptions.IndexError: list index out of range

在这条线上

title = hxs.select("//h3")[0].extract()

表示列表 hxs.select("//h3") 为空 ([]),因为尝试使用 hxs.select("//h3")[0] 使用 Python 告诉我们超出范围的索引.

indicates that the list hxs.select("//h3") is empty ([]) since attempting to access the first item (index 0) with hxs.select("//h3")[0] uses an index which Python tells us is out of range.

您正在解析的 html 显然没有

标签.

The html you are parsing apparently has no <h3> tags.

此外,在您修复上述错误后,您还需要在 (a,) 中的 a 后面加一个逗号:

Also, after you fix the above error, you'll need to put a comma after the a in (a,):

cur.execute("""Insert into heads(h2) Values(%s )""",(a,))

(a) 被计算为 a,而 (a,) 表示一个包含 1 个元素的元组.

(a) is evaluated to a, whereas (a,) represents a tuple with 1 element inside.

这篇关于python scrapy在本地主机上正常工作吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 19:45