问题描述
我有下面的python脚本,它运行的很好。
进口urllib2
url ='http://abc.com'#在此写入网址
$ b $ usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
打印数据
然而,我给它的一些URL可能会重定向2次或更多次。我如何让python在加载数据之前等待重定向完成。
例如,当使用上面的代码时,
http://www.google.com/search?hl= en& q = KEYWORD& btnI = 1
这就是击中谷歌搜索,我得到:
>>> url ='http://www.google.com/search?hl=zh-TW&q=KEYWORD&btnI=1'
>>> usick = urllib2.urlopen(url)
Traceback(最近一次调用的最后一个):
在< module>中的第1行文件< stdin>
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第126行,用urlopen
返回_opener.open(url,data,超时)
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第400行,打开
response = meth(req,response )
在http_response
'http'中的文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第513行,请求,响应,代码,msg,hdrs)
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第438行,错误
返回self。 _call_chain(* args)
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第372行,在_call_chain
result = func( * args)
在http_error_default
中的文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第521行,引发HTTPError(req.get_full_url (),code,msg,hdrs ,fp)
urllib2.HTTPError:HTTP Error 403:Forbidden
>>>
我试过了(url,data,timeout),但是我不确定要放在那里。 / p>
编辑:
我实际上发现如果我不重定向并只使用第一个链接的标题,我可以获取下一个重定向的位置并使用它作为我的最终链接
使用Requests库可以更好地控制重定向处理:
请求:
(urllib替代人类)
I have the following python script and it works beautifully.
import urllib2
url = 'http://abc.com' # write the url here
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
print data
however, some of the URL's I give it may redirect it 2 or more times. How can I have python wait for redirects to complete before loading the data.For instance when using the above code with
http://www.google.com/search?hl=en&q=KEYWORD&btnI=1
which is the equvilant of hitting the im lucky button on a google search, I get:
>>> url = 'http://www.google.com/search?hl=en&q=KEYWORD&btnI=1'
>>> usick = urllib2.urlopen(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
>>>
Ive tried the (url, data, timeout) however, I am unsure what to put there.
EDIT:I actually found out if I dont redirect and just used the header of the first link, I can grab the location of the next redirect and use that as my final link
You might be better off with Requests library which has better APIs for controlling redirect handling:
http://docs.python-requests.org/en/latest/user/quickstart/#redirection-and-history
Requests:
http://pypi.python.org/pypi/requests/ (urllib replacement for humans)
这篇关于Python遵循重定向,然后下载页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!