本文介绍了从urlReq(url)中删除'urllib.error.HTTPError:HTTP Error 302:'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 大家好吗? :) 我正在尝试使用某些网址参数来抓取一个网站。 如果我正确使用 url1,url2,url3 ,它会正确执行 WORKS ,并打印出我想要的常规输出(html)->Hey guys what's up? :)I'm trying to scrape a website with some url parameters.If I use url1, url2, url3 it WORKS properly and it prints me the regular output I want (html) ->import bs4from urllib.request import urlopen as urlReqfrom bs4 import BeautifulSoup as soup# create urlsurl1 = 'https://en.titolo.ch/sale'url2 = 'https://en.titolo.ch/sale?limit=108'url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'# opening up connection on each url, grabbing the pageuClient = urlReq(url4)page_html = uClient.read()uClient.close()# parsing the downloaded htmlpage_soup = soup(page_html, "html.parser")# print the htmlprint(page_soup.body.prettify())->但是,当我尝试 url4 url4 ='https://en.titolo.ch/sale?category_styles=31066&limit=108'下面的错误。我究竟做错了什么? -也许与Cookie有关? ->但是为什么它可以在其他网址上使用呢? -也许它们只是阻止抓取尝试? -如何使用多个参数?-> BUT when I try "url4" url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108' It gives me the Error below. What am I doing wrong?- Maybe it has something to do with cookies? -> But why does it work on the other urls...- Maybe they are just blocking the scrape attempt?- How can I avoid this error with using multiple Parameters in the URL?urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.The last 30x error message was:Moved Temporarily我在这里先向您的帮助表示感谢! 干杯艾伦Thanks for the help in advance!CheersAlan 我已经尝试过的东西:我尝试了请求libWhat I have already tried:I tried the requests libimport requestsurl = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'r = requests.get(url)html = r.textprint(html)<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>403 Forbidden</title></head><body><h1>Forbidden</h1><p>You don't have permission to access /saleon this server.</p></body></html>[Finished in 0.375s] 完整的错误消息来自urllib请求:Traceback (most recent call last): File "C:\Users\jedi\Documents\non\of\your\business\smile\stackoverflow_question", line 12, in <module> uClient = urlReq(url4) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error result = self._call_chain(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error result = self._call_chain(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error result = self._call_chain(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error result = self._call_chain(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 755, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 563, in error result = self._call_chain(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "C:\Users\jedi\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 745, in http_error_302 self.inf_msg + msg, headers, fp)urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.The last 30x error message was:Moved Temporarily[Finished in 2.82s]推荐答案如果使用 requests 包并在标题中添加用户代理,则看起来它正在获得 200 响应。因此,请尝试添加用户代理标头:If use requests package and add in the user agent in the headers, it looks like it's getting 200 response for all 4 of those links. So try adding in the user agent headers: headers = {'User-Agent':'Mozilla / 5.0(Windows NT 10.0; Win64; x64)AppleWebKit / 537.36( KHTML,例如Gecko)Chrome / 72.0.3626.121 Safari / 537.36'}headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}import requestsfrom bs4 import BeautifulSoup as soup# create urlsurl1 = 'https://en.titolo.ch/sale'url2 = 'https://en.titolo.ch/sale?limit=108'url3 = 'https://en.titolo.ch/sale?category_styles=29838_21212'url4 = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}url_list = [url1, url2, url3, url4]for url in url_list:# opening up connection on each url, grabbing the page response = requests.get(url, headers=headers) print (response.status_code) 输出:200200200200因此:import requestsheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}url = 'https://en.titolo.ch/sale?category_styles=31066&limit=108'r = requests.get(url, headers=headers)html = r.textprint(html) 这篇关于从urlReq(url)中删除'urllib.error.HTTPError:HTTP Error 302:'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-01 16:13