问题描述
我需要在Python中通过http下载多个文件。
I need to download several files via http in Python.
最明显的方法是使用urllib2:
The most obvious way to do it is just using urllib2:
import urllib2
u = urllib2.urlopen('http://server.com/file.html')
localFile = open('file.html', 'w')
localFile.write(u.read())
localFile.close()
但我必须以某种方式处理令人讨厌的网址,如下所示: http://server.com/!Run.aspx/ ?someoddtext / somemore ID = 121安培; M = PDF
。当通过浏览器下载文件时,该文件具有可读的名称,即。 accounts.pdf
。
But I'll have to deal with the URLs that are nasty in some way, say like this: http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf
. When downloaded via the browser, the file has a human-readable name, ie. accounts.pdf
.
有没有办法在python中处理,所以我不需要知道文件名,并将其硬编码到我的脚本中?
Is there any way to handle that in python, so I don't need to know the file names and hardcode them into my script?
推荐答案
下载这样的脚本往往会推送一个标题,告诉用户代理该命名的文件:
Download scripts like that tend to push a header telling the user-agent what to name the file:
Content-Disposition: attachment; filename="the filename.ext"
如果可以抓住该标题,可以获取正确的文件名。
If you can grab that header, you can get the proper filename.
有有 -grabbing。
There's another thread that has a little bit of code to offer up for Content-Disposition
-grabbing.
remotefile = urllib2.urlopen('http://example.com/somefile.zip')
remotefile.info()['Content-Disposition']
这篇关于如何使用python以“更聪明”的方式下载文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!