本文介绍了Python 2.6 urlib2 超时问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎我无法将 urllib2 超时考虑在内.我确实阅读了 - 我想 - 所有与此主题相关的帖子,似乎我没有做错任何事情.我对么?非常感谢您的帮助.

It seems I cannot get the urllib2 timeout to be taken into account.I did read - I suppose - all posts related to this topic and it seems I'm not doing anything wrong. Am I correct?Many thanks for your kind help.

场景:

在继续执行脚本的其余部分之前,我需要检查 Internet 连接.然后我写了一个函数(Net_Access),下面提供.

I need to check for Internet connectivity before continuing with the remaining of a script.I then wrote a function (Net_Access), which is provided below.

  • 当我在连接 LAN 或 Wifi 接口的情况下执行此代码并检查现有主机名时:一切正常,因为没有错误或问题,因此没有超时.
  • 如果我拔掉 LAN 连接器或检查不存在的主机名,超时值似乎被忽略.请问我的代码有什么问题?

一些信息:

  • Ubuntu 10.04.4 LTS(运行到 VirtualBox v4.2.6 VM,主机操作系统是 MAC OS X Lion)
  • cat/proc/sys/kernel/osrelease: 2.6.32-42-generic
  • Python 2.6.5

我的代码:

#!/usr/bin/env python

import socket
import urllib2

myhost = 'http://www.google.com'
timeout = 3

socket.setdefaulttimeout(timeout)
req = urllib2.Request(myhost)

try:
    handle = urllib2.urlopen(req, timeout = timeout)
except urllib2.URLError as e:
    socket.setdefaulttimeout(None)
    print ('[--- Net_Access() --- No network access')
else:
    print ('[--- Net_Access() --- Internet Access OK')

1) 工作,插入 LAN 连接器

1) Working, with LAN connector plugged in

$ $ time ./Net_Access
[--- Net_Access() --- Internet Access OK

real    0m0.223s
user    0m0.060s
sys 0m0.032s

2) 超时不起作用,LAN 连接器已拔掉

2) Timeout not working, with LAN connector unplugged

$ time ./Net_Access
[--- Net_Access() --- No network access

real    1m20.235s
user    0m0.048s
sys 0m0.060s

添加到原始帖子:测试结果(使用 IP 而不是 FQDN)

正如@unutbu(见评论)所建议的那样,用 IP 地址替换 myhost 中的 FQDN 解决了​​问题:超时生效.

As suggested by @unutbu (see comments) replacing the FQDN in myhost with an IP address fixes the problem: the timeout is taken into effect.

LAN 连接器已插入...
$时间./Net_Access[--- Net_Access() --- Internet 访问正常

LAN connector plugged in...
$ time ./Net_Access [--- Net_Access() --- Internet Access OK

real    0m0.289s
user    0m0.036s
sys 0m0.040s

LAN 连接器已拔下...
$时间./Net_Access[--- Net_Access() --- 无网络访问

LAN connector unplugged...
$ time ./Net_Access [--- Net_Access() --- No network access

real    0m3.082s
user    0m0.052s
sys 0m0.024s

这很好,但这意味着超时只能用于 IP 而不能用于 FQDN.奇怪……

This is nice, but it means that timeout could only be used with IP and not FQDN. Weird...

是否有人找到了一种使用 urllib2 超时的方法,而无需进入 pre-DNS 解析并将 IP 传递给该函数,或者您是否首先使用套接字来测试连接,然后在确定可以到达目标时触发 urllib2?

Did someone found a way to use urllib2 timeout without getting into pre-DNS resolution and pass IP to the function, or are you first using socket to test connection and then fire urllib2 when you are sure that you can reach the target?

非常感谢.

推荐答案

如果您的问题是 DNS 查找在没有网络连接的情况下永远(或只是太长)超时,那么是的,这是一个已知问题,并且您无法在 urllib2 本身内做任何事情来解决这个问题.

If your problem is with DNS lookup taking forever (or just way too long) to time out when there's no network connectivity, then yes, this is a known problem, and there's nothing you can do within urllib2 itself to fix that.

那么,所有希望都破灭了吗?嗯,不一定.

So, is all hope lost? Well, not necessarily.

首先,让我们看看发生了什么.最终,urlopen 依赖于 getaddrinfo,它(连同它的亲戚,如 gethostbyname)是众所周知的套接字 API 的一个关键部分,它可以不能异步运行或被中断(在某些平台上,它甚至不是线程安全的).如果你想自己追溯源代码,urllib2 遵从 httplib 用于创建连接,它在 上调用 create_connectionsocket,它在 ,最终调用真正的getaddrinfo函数.这是一个臭名昭著的问题,它影响到世界上用每种语言编写的每个网络客户端或服务器,而且没有好的、简单的解决方案.

First, let's look at what's going on. Ultimately, urlopen relies on getaddrinfo, which (along with its relatives like gethostbyname) is notoriously the one critical piece of the socket API that can't be run asynchronously or interrupted (and on some platforms, it's not even thread-safe). If you want to trace through the source yourself, urllib2 defers to httplib for creating connections, which calls create_connection on socket, which calls socket_getaddrinfo on _socket, which ultimately calls the real getaddrinfo function. This is an infamous problem that affects every network client or server written in every language in the world, and there's no good, easy solution.

一种选择是使用已经解决了这个问题的不同的更高级别的库.我相信 requests 依赖于 urllib3 最终有同样的问题,但 pycurl 依赖于 libcurl>,如果使用 c-ares 构建,会异步执行名称查找,因此可以超时.

One option is to use a different higher-level library that's already solved this problem. I believe requests relies on urllib3 which ultimately has the same problem, but pycurl relies on libcurl, which, if built with c-ares, does name lookup asynchronously, and therefore can time it out.

或者,当然,您可以使用诸如 twistedtornado 或其他一些异步网络库之类的东西.但很明显,重写所有代码以使用 twisted HTTP 客户端而不是 urllib2 并非完全微不足道.

Or, of course, you can use something like twisted or tornado or some other async networking library. But obviously rewriting all of your code to use a twisted HTTP client instead of urllib2 is not exactly trivial.

另一种选择是通过对标准库进行猴子补丁来修复"urllib2.如果你想这样做,有两个步骤.

Another option is to "fix" urllib2 by monkeypatching the standard library. If you want to do this, there are two steps.

首先,您必须提供可超时的getaddrinfo.您可以通过绑定 c-ares 或使用 ctypes 访问特定于平台的 API(如 linux 的 getaddrinfo_a),甚至查找名称服务器来实现此目的并直接与他们沟通.但真正简单的方法是使用线程.如果你要做很多这样的事情,你会想要使用单个线程或小线程池,但对于小规模使用,只需为每个调用分离一个线程.一个非常快速和肮脏(阅读:糟糕)的实现是:

First, you have to provide a timeoutable getaddrinfo. You could do this by binding c-ares, or using ctypes to access platform-specific APIs like linux's getaddrinfo_a, or even looking up the nameservers and communicating with them directly. But the really simple way to do it is to use threading. If you're doing lots of these, you'll want to use a single thread or small threadpool, but for small-scale use, just spin off a thread for each call. A really quick-and-dirty (read: bad) implementation is:

def getaddrinfo_async(*args):
    result = None
    t = threading.Thread(target=lambda: result=socket.getaddrinfo(*args))
    t.start()
    t.join(timeout)
    if t.isAlive():
        raise TimeoutError(blahblahblah)
    return result

接下来,您必须获得您关心的所有库才能使用它.根据您希望补丁的普遍性(和危险性),您可以替换 socket.getaddrinfo 本身,或仅替换 socket.create_connection,或仅替换 中的代码>httplib 甚至 urllib2.

Next, you have to get all the libraries you care about to use this. Depending on how ubiquitous (and dangerous) you want your patch to be, you can replace socket.getaddrinfo itself, or just socket.create_connection, or just the code in httplib or even urllib2.

最后一个选择是在更高级别解决此问题.如果你的网络事情发生在后台线程上,你可以在整个事情上抛出一个更高级别的超时,如果花费超过 timeout 秒来确定它是否超时,你知道它有.

A final option is to fix this at a higher level. If your networking stuff is happening on a background thread, you can throw a higher-level timeout on the whole thing, and if it took more than timeout seconds to figure out whether it's timed out or not, you know it has.

这篇关于Python 2.6 urlib2 超时问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 12:03