问题描述
我熟悉我应该将 HTTP_RPOXY 环境变量设置为代理地址这一事实.
一般urllib工作正常,问题出在urllib2上.
>>>urllib2.urlopen("http://www.google.com").read()返回
urllib2.URLError:
或
urllib2.URLError:
额外信息:
urllib.urlopen(....) 工作正常!只是 urllib2 在耍花招......
我尝试了@Fenikso 的回答,但现在出现此错误:
URLError: <urlopen error [Errno 10060] 连接尝试失败,因为关联方在一段时间后未正确响应,或建立连接失败,因为连接的主机未能响应>
有什么想法吗?
即使没有 HTTP_PROXY 环境变量,您也可以做到.试试这个示例:
导入 urllib2proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})开瓶器 = urllib2.build_opener(proxy_support)urllib2.install_opener(开瓶器)html = urllib2.urlopen("http://www.google.com").read()打印html
在您的情况下,代理服务器似乎确实拒绝连接.
更多尝试:
导入 urllib2#proxy = "61.233.25.166:80"proxy = "YOUR_PROXY_GOES_HERE"代理 = {"http":"http://%s" % 代理}url = "http://www.google.com/search?q=test"headers={'用户代理':'Mozilla/5.0'}proxy_support = urllib2.ProxyHandler(代理)opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))urllib2.install_opener(开瓶器)req = urllib2.Request(url, None, headers)html = urllib2.urlopen(req).read()打印html
编辑 2014:这似乎是一个流行的问题/答案.但是今天我会改用第三方 requests
模块.
对于一个请求,只需:
导入请求r = requests.get("http://www.google.com",代理={"http": "http://61.233.25.166:80"})打印(r.text)
对于多个请求,请使用 Session
对象,这样您就不必在所有请求中添加 proxys
参数:
导入请求s = requests.Session()s.proxys = {"http": "http://61.233.25.166:80"}r = s.get("http://www.google.com")打印(r.text)
I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.
Generally urllib works fine, the problem is dealing with urllib2.
>>> urllib2.urlopen("http://www.google.com").read()
returns
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
or
urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Extra info:
urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks...
I tried @Fenikso answer but I'm getting this error now:
URLError: <urlopen error [Errno 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>
Any ideas?
You can do it even without the HTTP_PROXY environment variable. Try this sample:
import urllib2
proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.google.com").read()
print html
In your case it really seems that the proxy server is refusing the connection.
Something more to try:
import urllib2
#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"
proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html
Edit 2014:This seems to be a popular question / answer. However today I would use third party requests
module instead.
For one request just do:
import requests
r = requests.get("http://www.google.com",
proxies={"http": "http://61.233.25.166:80"})
print(r.text)
For multiple requests use Session
object so you do not have to add proxies
parameter in all your requests:
import requests
s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}
r = s.get("http://www.google.com")
print(r.text)
这篇关于使用 HTTP 代理 - Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!