本文介绍了如何使用python以“更聪明”的方式下载文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在Python中通过http下载多个文件。

I need to download several files via http in Python.

最明显的方法是使用urllib2:

The most obvious way to do it is just using urllib2:

import urllib2
u = urllib2.urlopen('http://server.com/file.html')
localFile = open('file.html', 'w')
localFile.write(u.read())
localFile.close()

但我必须以某种方式处理令人讨厌的网址,如下所示: http://server.com/!Run.aspx/ ?someoddtext / somemore ID = 121安培; M = PDF 。当通过浏览器下载文件时,该文件具有可读的名称,即。 accounts.pdf

But I'll have to deal with the URLs that are nasty in some way, say like this: http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf. When downloaded via the browser, the file has a human-readable name, ie. accounts.pdf.

有没有办法在python中处理,所以我不需要知道文件名,并将其硬编码到我的脚本中?

Is there any way to handle that in python, so I don't need to know the file names and hardcode them into my script?

推荐答案

下载这样的脚本往往会推送一个标题,告诉用户代理该命名的文件:

Download scripts like that tend to push a header telling the user-agent what to name the file:

Content-Disposition: attachment; filename="the filename.ext"

如果可以抓住该标题,可以获取正确的文件名。

If you can grab that header, you can get the proper filename.

有有 -grabbing。

There's another thread that has a little bit of code to offer up for Content-Disposition-grabbing.

remotefile = urllib2.urlopen('http://example.com/somefile.zip')
remotefile.info()['Content-Disposition']

这篇关于如何使用python以“更聪明”的方式下载文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-26 08:48