问题描述
我使用urllib2从ftp-和http-servers加载文件。
一些服务器每个IP只支持一个连接。问题是,urllib2不会立即关闭连接。看看示例程序。
来自urllib2 import urlopen
从时间导入sleep
url ='ftp:// user:pass@host/big_file.ext'
def load_file(url):
f = urlopen(url)
loaded = 0
while True:
data = f.read(1024)
if data =='':
break
loaded + = len(data)
f。 close()
#sleep(1)
print('loaded {0}'。format(loaded))
load_file(url)
load_file
代码从ftp服务器加载两个文件(这里两个文件相同)只有1个连接。这将打印以下日志:
加载463675266
回溯(最近最后调用):
文件conection_test.py,第20行,在< module>
load_file(url)
文件conection_test.py,第7行,在load_file
f = urlopen(url)
文件/usr/lib/python2.6/urllib2。 py,行126,在urlopen
return _opener.open(url,data,timeout)
文件/usr/lib/python2.6/urllib2.py,第391行,在
response = self._open(req,data)
文件/usr/lib/python2.6/urllib2.py,第409行,在_open
'_open',req)
文件/usr/lib/python2.6/urllib2.py,第369行,在_call_chain
result = func(* args)
文件/usr/lib/python2.6/urllib2。 py,行1331,在ftp_open
fw = self.connect_ftp(user,passwd,host,port,dirs,req.timeout)
文件/usr/lib/python2.6/urllib2.py ,行1352,在connect_ftp
fw = ftpwrapper(user,passwd,host,port,dirs,timeout)
文件/usr/lib/python2.6/urllib.py,行854,在__init__
self.init()
文件/usr/lib/python2.6/urllib.py,行860,在init
self.ftp.connect(self.host, self.port,self.timeout)
文件/usr/lib/python2.6/ftplib.py,行134,在连接
self.welcome = self.getresp()
文件/usr/lib/python2.6/ftplib.py,行216,在getresp
raise error_temp,resp
urllib2.URLError:< urlopen error ftp error:421有太多的连接从您的互联网地址。>
因此,第一个文件被加载,第二个文件失败,因为第一个连接未关闭。 >
但是当我在之后使用
错误不会发生: sleep(1)
加载463675266
加载463675266
有没有办法强制关闭连接,以便第二次下载不会失败?
解决方案原因确实是一个文件描述符泄漏。我们还发现,使用jython,问题比使用cpython更明显。
一位同事提出了这个解决方案:
fdurl = urllib2.urlopen(req,timeout = self.timeout)
realsock = fdurl.fp._sock.fp._sock **#我们要关闭真正的套接字
req = urllib2.Request(url,header)
try:
fdurl = urllib2.urlopen(req,timeout = self.timeout)
除了urllib2.URLError,e:
printurlopen exception,e
realsock.close()
fdurl.close()
修复是丑陋的,但是工作,没有更多的 。
I'm using urllib2 to load files from ftp- and http-servers.
Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.
from urllib2 import urlopen from time import sleep url = 'ftp://user:pass@host/big_file.ext' def load_file(url): f = urlopen(url) loaded = 0 while True: data = f.read(1024) if data == '': break loaded += len(data) f.close() #sleep(1) print('loaded {0}'.format(loaded)) load_file(url) load_file(url)
The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:
loaded 463675266 Traceback (most recent call last): File "conection_test.py", line 20, in <module> load_file(url) File "conection_test.py", line 7, in load_file f = urlopen(url) File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout) File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp fw = ftpwrapper(user, passwd, host, port, dirs, timeout) File "/usr/lib/python2.6/urllib.py", line 854, in __init__ self.init() File "/usr/lib/python2.6/urllib.py", line 860, in init self.ftp.connect(self.host, self.port, self.timeout) File "/usr/lib/python2.6/ftplib.py", line 134, in connect self.welcome = self.getresp() File "/usr/lib/python2.6/ftplib.py", line 216, in getresp raise error_temp, resp urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>
So the first file is loaded and the second fails because the first connection was not closed.
But when i use
sleep(1)
afterf.close()
the error does not occurr:loaded 463675266 loaded 463675266
Is there any way to force close the connection so that the second download would not fail?
解决方案The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython.A colleague proposed this sollution:
fdurl = urllib2.urlopen(req,timeout=self.timeout) realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later req = urllib2.Request(url, header) try: fdurl = urllib2.urlopen(req,timeout=self.timeout) except urllib2.URLError,e: print "urlopen exception", e realsock.close() fdurl.close()The fix is ugly, but does the job, no more "too many open connections".
这篇关于关闭urllib2连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!