问题描述
有一个gif图片链接,但urllib2无法下载。
There is a link with a gif image, but urllib2 can't download it.
import urllib.request as urllib2
uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'
try:
req = urllib2.Request(uri, headers={ 'User-Agent': 'Mozilla/5.0' })
file = urllib2.urlopen(req)
except urllib2.HTTPError as err:
print('HTTP error!!!')
file = err
print(err.code)
except urllib2.URLError as err:
print('URL error!!!')
print(err.reason)
return
data = file.read(1024)
print(data)
脚本完成后,数据仍为空。为什么会这样?没有HTTPError,我可以在浏览器控制台中看到有一个有效的gif,HTTP响应的状态是200 OK。谢谢。
After script finishes, data remains empty. Why does it happen? There is no HTTPError, I can see in browser console that there is a valid gif and status of HTTP responce is 200 OK. Thank you.
推荐答案
您应该检查浏览器发送到服务器的所有标题。
You should check all headers which browser sends to server.
此页面需要两个标题: User-Agent
和 Cookie
This page needs two headers: User-Agent
and Cookie
如果您在Chrome或Firefox中使用 DevTools
,您会看到通常浏览器(如果它还没有cookie)收到第一个带有cookie和<$ c的响应$ c> 302暂时移动,它会重定向到相同的网址,但会使用Cookie,然后会收到图片。
If you use DevTools
in Chrome or Firefox you will see that normally browser (if it has no cookie yet) receives first response with cookie and 302 Moved Temporarily
which redirects to the same url but with cookie and then it receives image.
您可以尝试我的Cookie,也许它接收图像。通常你必须做两个请求 - 第一个获取cookie,第二个(使用cookie)获取图像。
You can try my cookie and maybe it receives image. Bu normally you have to do two requests - first to get cookie and second (with cookie) to get image.
import urllib.request as urllib2
uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'
headers = {
'User-Agent': 'Mozilla/5.0',
'Cookie': 'JEB2=583077046E650E2495131DE8FD2F1371',
}
try:
req = urllib2.Request(uri, headers=headers)
f = urllib2.urlopen(req)
except urllib2.HTTPError as err:
print('HTTP error!!!')
f = err
print(err.code)
except urllib2.URLError as err:
print('URL error!!!')
print(err.reason)
data = f.read(1024)
print(data)
如果您使用 requests
模块,那么它将自动执行所有操作,您将不需要两个请求。
If you use requests
module then it will do all automatically and you will no need two requests.
import requests
uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'
headers = {
'User-Agent': 'Mozilla/5.0',
}
r = requests.get(uri, headers=headers)
print(r.content)
这篇关于urllib2.urlopen无法获取图片,但浏览器可以的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!