python - 下载HTML5 cache.manifest文件中列出的所有工件的最佳方法？

我正在尝试查看HTML5应用程序的工作方式，并且将页面保存在webkit浏览器(chrome，Safari)中的任何尝试都包括部分但不是全部的cache.manifest资源。是否有一个库或一组代码可以解析cache.manifest文件，并下载所有资源(图像，脚本，css)？

(原始代码已移至答案... noob错误>。

最佳答案

我最初将其发布为问题的一部分...(没有新手stackoverflow海报曾经这样做过；)

因为一直没有答案。干得好:

我能够提出以下python脚本来做到这一点，但任何输入都会受到赞赏=)(这是我第一次尝试python代码，因此可能会有更好的方法)

import os
import urllib2
import urllib

cmServerURL = 'http://<serverURL>:<port>/<path-to-cache.manifest>'

# download file code taken from stackoverflow
# http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python
def loadURL(url, dirToSave):
        file_name = url.split('/')[-1]
        u = urllib2.urlopen(url)
        f = open(dirToSave, 'wb')
        meta = u.info()
        file_size = int(meta.getheaders("Content-Length")[0])
        print "Downloading: %s Bytes: %s" % (file_name, file_size)

        file_size_dl = 0
        block_sz = 8192
        while True:
                buffer = u.read(block_sz)
                if not buffer:
                        break

                file_size_dl += len(buffer)
                f.write(buffer)
                status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
                status = status + chr(8)*(len(status)+1)
                print status,

        f.close()

# download the cache.manifest file
# since this request doesn't include the Conent-Length header we will use a different api =P
urllib.urlretrieve (cmServerURL+ 'cache.manifest', './cache.manifest')

# open the cache.manifest and go through line-by-line checking for the existance of files
f = open('cache.manifest', 'r')
for line in f:
        filepath = line.split('/')
        if len(filepath) > 1:
                fileName = line.strip()
                # if the file doesn't exist, lets download it
                if not os.path.exists(fileName):
                                print 'NOT FOUND: ' + line
                                dirName = os.path.dirname(fileName)
                                print 'checking dirctory: ' + dirName
                                if not os.path.exists(dirName):
                                        os.makedirs(dirName)
                                else:
                                        print 'directory exists'
                                print 'downloading file: ' + cmServerURL + line,
                                loadURL (cmServerURL+fileName, fileName)

关于python - 下载HTML5 cache.manifest文件中列出的所有工件的最佳方法？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/7394861/