问题描述
我最近继承了一个 python 项目,我现在正在维护它.部分代码从网站发出几十万个请求并将结果保存到数据库中.代码将相同的 httplib.HTTPConnection 对象重用于到达请求,然后循环遍历
I recently inherited a python project, and I'm working on maintaining it now. Part of the code makes a few hundred thousand requests from a website and saves the results to a database. The code is reusing the same httplib.HTTPConnection object for reach request and then just looping over a
conn.request("GET",someString,'',headers)
response = conn.getresponse()
部分.几天前在我的日志中,我看到其中一个请求抛出了异常:
section. A few days ago in my logs I saw that one of the requests threw the exception:
[Errno 104] Connection reset by peer
之后是所有其他 conn.request() 失败.我的第一个倾向是为每个请求建立一个新的连接,但这对性能的影响是深远而可怕的.所以我的问题是,我该如何解决这个问题,尤其是因为我不能 100% 确定我什至可以真正测试这个.
followed by every other conn.request() failing. My first inclination was to just build a new connection for each request, but the perfomance impact of that was profound and horrible. So my question is, how do I fix this, especially since I'm not 100% sure how I can even really test this.
如果我只是在异常之后调用 conn.connect() ,它会正确地重新连接吗?
If I just call conn.connect() after an exception, will it correctly reconnect?
我正在寻找有关如何修复它以及可能如何测试它的建议.
I'm looking for advise on how to fix it and possibly how I could test it.
感谢您的时间.
推荐答案
我认为您首先需要决定要处理的故障模式.例如,连接是否因为服务器上的临时资源问题而重置,快速周转连接将修复它?或者,服务器是否关闭或重新启动,您应该中止您的进程?
I think you first need to decide the failure mode you want to handle. For instance, did the connection reset because of a temporary resource problem on the server and a quick turnaround connect will fix it? Or, is the server down or rebooting and you should abort your process?
假设第一种情况,我认为您的想法是正确的.尝试这样的事情(注意,这不是工作代码 - 这只是逻辑的一个例子):
Presuming the first case, I think you are thinking along the right lines. Try something like this (note, this is not working code - it's just an example of the logic):
while True:
try:
conn.request("GET",someString,'',headers)
response = conn.getresponse()
except httplib.HTTPException, e:
conn.connect()
continue
break
您可能应该为此添加一些逻辑,以在重复连接尝试之间暂停并在尝试一定次数后放弃(这基本上是上面的第二种情况).
You should probably add some logic to that to pause between repeated connect attempts and to give up after a certain number of tries (which is basically the second scenario above).
为了测试这一点,请尝试使用 tcpkill 来重置 TCP 连接:
In order to test this, try using tcpkill to cause the TCP connection to reset:
http://www.gnutoolbox.com/tcpkill-command/
这篇关于在 python 2.7 中重用 httplib.HTTPConnection的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!