使用 HTTP 代理 - Python

本文介绍了使用 HTTP 代理 - Python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我熟悉我应该将 HTTP_RPOXY 环境变量设置为代理地址这一事实.

一般urllib工作正常，问题出在urllib2上.

>>>urllib2.urlopen("http://www.google.com").read()

urllib2.URLError:

或

urllib2.URLError:

额外信息:

urllib.urlopen(....) 工作正常！只是 urllib2 在耍花招......

我尝试了@Fenikso 的回答，但现在出现此错误:

URLError: <urlopen error [Errno 10060] 连接尝试失败，因为关联方在一段时间后未正确响应，或建立连接失败，因为连接的主机未能响应>

有什么想法吗?

解决方案

即使没有 HTTP_PROXY 环境变量，您也可以做到.试试这个示例:

导入 urllib2proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})开瓶器 = urllib2.build_opener(proxy_support)urllib2.install_opener(开瓶器)html = urllib2.urlopen("http://www.google.com").read()打印html

在您的情况下，代理服务器似乎确实拒绝连接.

更多尝试:

导入 urllib2#proxy = "61.233.25.166:80"proxy = "YOUR_PROXY_GOES_HERE"代理 = {"http":"http://%s" % 代理}url = "http://www.google.com/search?q=test"headers={'用户代理':'Mozilla/5.0'}proxy_support = urllib2.ProxyHandler(代理)opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))urllib2.install_opener(开瓶器)req = urllib2.Request(url, None, headers)html = urllib2.urlopen(req).read()打印html

编辑 2014:这似乎是一个流行的问题/答案.但是今天我会改用第三方 requests 模块.

对于一个请求，只需:

导入请求r = requests.get("http://www.google.com",代理={"http": "http://61.233.25.166:80"})打印(r.text)

对于多个请求，请使用 Session 对象，这样您就不必在所有请求中添加 proxys 参数:

导入请求s = requests.Session()s.proxys = {"http": "http://61.233.25.166:80"}r = s.get("http://www.google.com")打印(r.text)

I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address.

Generally urllib works fine, the problem is dealing with urllib2.

>>> urllib2.urlopen("http://www.google.com").read()

returns

urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

Extra info:

urllib.urlopen(....) works fine! It is just urllib2 that is playing tricks...

I tried @Fenikso answer but I'm getting this error now:

URLError: <urlopen error [Errno 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>

Any ideas?

解决方案

You can do it even without the HTTP_PROXY environment variable. Try this sample:

import urllib2

proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)

html = urllib2.urlopen("http://www.google.com").read()
print html

In your case it really seems that the proxy server is refusing the connection.

Something more to try:

import urllib2

#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"

proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}

proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)

req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html

Edit 2014:This seems to be a popular question / answer. However today I would use third party requests module instead.

For one request just do:

import requests

r = requests.get("http://www.google.com",
                 proxies={"http": "http://61.233.25.166:80"})
print(r.text)

For multiple requests use Session object so you do not have to add proxies parameter in all your requests:

import requests

s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}

r = s.get("http://www.google.com")
print(r.text)

这篇关于使用 HTTP 代理 - Python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！