当我运行函数以从某个特定站点获取一些链接时,它会从第一页获取链接,但没有继续进行下一页的操作,而是中断显示以下错误。
搜寻器:
import requests
from lxml import html
def Startpoint(mpage):
page=4
while page<=mpage:
address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
tail="https://www.katalystbusiness.co.nz/business-profiles/"
page = requests.get(address)
tree = html.fromstring(page.text)
titles = tree.xpath('//p/a/@href')
for title in titles:
if "bindex" not in title:
if "cdn-cgi" not in title:
print(tail + title)
page+=1
Startpoint(5)
错误信息:
Traceback (most recent call last):
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 19, in <module>
Startpoint(5)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\New.py", line 6, in Startpoint
while page<=mpage:
TypeError: unorderable types: Response() <= int()
最佳答案
您正在将requests.get(address)
的结果分配给page
。然后,Python无法将requests.Response
对象与int进行比较。只需调用page
之类的其他名称,例如response
。最后一行也有缩进错误。
import requests
from lxml import html
def Startpoint(mpage):
page=4
while page<=mpage:
address = "https://www.katalystbusiness.co.nz/business-profiles/bindex"+str(page)+".html"
tail="https://www.katalystbusiness.co.nz/business-profiles/"
response = requests.get(address)
tree = html.fromstring(response.text)
titles = tree.xpath('//p/a/@href')
for title in titles:
if "bindex" not in title:
if "cdn-cgi" not in title:
print(tail + title)
page+=1
Startpoint(5)
关于python - 麻烦进入下一页,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43548652/