问题描述
背景:我正在使用,与 urllib *
模块中的任何其他函数相反,因为支持钩子函数(参见 reporthook 下面)..用于显示文本进度条。这是Python> = 2.6。
Background: I am using urllib.urlretrieve
, as opposed to any other function in the urllib*
modules, because of the hook function support (see reporthook
below) .. which is used to display a textual progress bar. This is Python >=2.6.
>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])
然而, urlretrieve
是如此愚蠢,以至于它无法检测HTTP请求的状态(例如:是404还是200?)。
However, urlretrieve
is so dumb that it leaves no way to detect the status of the HTTP request (eg: was it 404 or 200?).
>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar')
>>> h.items()
[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),
('expires', '-1'),
('content-type', 'text/html; charset=ISO-8859-1'),
('server', 'gws'),
('cache-control', 'private, max-age=0')]
>>> h.status
''
>>>
下载具有钩状支持的远程HTTP文件的最有名的方法是什么(显示进度bar)和一个不错的HTTP错误处理?
What is the best known way to download a remote HTTP file with hook-like support (to show progress bar) and a decent HTTP error handling?
推荐答案
查看 urllib.urlretrieve
完整代码:
def urlretrieve(url, filename=None, reporthook=None, data=None):
global _urlopener
if not _urlopener:
_urlopener = FancyURLopener()
return _urlopener.retrieve(url, filename, reporthook, data)
换句话说,你可以使用(它是公共urllib API的一部分)。你可以覆盖 http_error_default
来检测404s:
In other words, you can use urllib.FancyURLopener (it's part of the public urllib API). You can override http_error_default
to detect 404s:
class MyURLopener(urllib.FancyURLopener):
def http_error_default(self, url, fp, errcode, errmsg, headers):
# handle errors the way you'd like to
fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)
这篇关于如何在urllib.urlretrieve中捕获404错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!