本文介绍了Python遵循重定向,然后下载页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的python脚本,它运行的很好。

 进口urllib2 

url ='http://abc.com'#在此写入网址
$ b $ usock = urllib2.urlopen(url)
data = usock.read()
usock.close()

打印数据

然而,我给它的一些URL可能会重定向2次或更多次。我如何让python在加载数据之前等待重定向完成。
例如,当使用上面的代码时,

  http://www.google.com/search?hl= en& q = KEYWORD& btnI = 1 

这就是击中谷歌搜索,我得到:

 >>> url ='http://www.google.com/search?hl=zh-TW&q=KEYWORD&btnI=1'
>>> usick = urllib2.urlopen(url)
Traceback(最近一次调用的最后一个):
在< module>中的第1行文件< stdin>
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第126行,用urlopen
返回_opener.open(url,data,超时)
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第400行,打开
response = meth(req,response )
在http_response
'http'中的文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第513行,请求,响应,代码,msg,hdrs)
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第438行,错误
返回self。 _call_chain(* args)
文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第372行,在_call_chain
result = func( * args)
在http_error_default
中的文件/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py,第521行,引发HTTPError(req.get_full_url (),code,msg,hdrs ,fp)
urllib2.HTTPError:HTTP Error 403:Forbidden
>>>

我试过了(url,data,timeout),但是我不确定要放在那里。 / p>

编辑:
我实际上发现如果我不重定向并只使用第一个链接的标题,我可以获取下一个重定向的位置并使用它作为我的最终链接

解决方案

使用Requests库可以更好地控制重定向处理:





请求:



(urllib替代人类)


I have the following python script and it works beautifully.

import urllib2

url = 'http://abc.com' # write the url here

usock = urllib2.urlopen(url)
data = usock.read()
usock.close()

print data

however, some of the URL's I give it may redirect it 2 or more times. How can I have python wait for redirects to complete before loading the data.For instance when using the above code with

http://www.google.com/search?hl=en&q=KEYWORD&btnI=1

which is the equvilant of hitting the im lucky button on a google search, I get:

>>> url = 'http://www.google.com/search?hl=en&q=KEYWORD&btnI=1'
>>> usick = urllib2.urlopen(url)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
    return self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
>>>

Ive tried the (url, data, timeout) however, I am unsure what to put there.

EDIT:I actually found out if I dont redirect and just used the header of the first link, I can grab the location of the next redirect and use that as my final link

解决方案

You might be better off with Requests library which has better APIs for controlling redirect handling:

http://docs.python-requests.org/en/latest/user/quickstart/#redirection-and-history

Requests:

http://pypi.python.org/pypi/requests/ (urllib replacement for humans)

这篇关于Python遵循重定向,然后下载页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 18:59