问题描述
我可以打开网页,没问题.我可以将网页保存为 html,没问题.我需要将网页另存为 mht,这样我就可以获得所有隐藏的 html,而无需另存为 mht.在研究中,我对如何使用 python 保存为 mht 一无所知.就像我上面说的,我可以尝试将它保存为 mht 文件,使用标准编码保存为 html 但这根本不起作用......它也不起作用并不奇怪,但值得一试.
I can bring up a web page, no problem. I can save the webpage...as html, no problem. I need to save the webpage as mht so I can can get all the html that gets hidden without saving as mht. In researching I'm coming up with absolutely nothing as to how to save as mht using python. Like I said above I can try to save it as a mht file, using the standard coded for saving as html but that simply doesn't work...not surprised it doesn't work either, but it was worth a shot.
url = 'https://www.thewebsite.com'
html = urllib.request.urlopen(url).read()
m = open('websitetest.mht', 'w')
m.write(str(html))
m.close()
我试图保存的站点在保存为 mht 时会出现隐藏代码",但在保存为 html 时不会出现.因此,为什么我要尝试另存为 mht 以便我获得所有代码,然后可以通过代码来完成编译数据库所需的内容.
The site I'm trying to save does 'hidden code' that comes across when saved as mht, but not when saved as html. Hence why I'm trying to save as mht so I get all the code and then can go through the code to pull off what I need to compile a database.
推荐答案
有一个用 Python 2.7 编码的非常方便的 Github 项目(您需要进行简单的修改以使其与 Python 3.4 兼容).这个项目有打包/解包 MHT 文件的代码.我想这就是你要找的:
There is a very handy Github project coded in Python 2.7 (you need to make simple modifications to make it compatible with Python 3.4). This project has code for packing/unpacking MHT files. I think this is what you are looking for:
将 MHT (MHTML) 存档解压缩到/从单独的文件中,在目录中写入/读取它们以匹配他们的内容位置.
这篇关于Python,另存为 mht的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!