本文介绍了tarfile 无法打开 tgz的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从该网站下载 tgz 文件:https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07
I am trying to download tgz file from this website:https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07
这是我的脚本:
import os
from six.moves import urllib
import tarfile
spam_path=os.path.join('ML', 'spam')
root_download='https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07'
spam_url=root_download+'255 MB Corpus (trec07p.tgz)'
if not os.path.isdir(spam_path):
os.makedirs(spam_path)
path=os.path.join(spam_path, 'trec07p.tgz')
if not os.path.isfile('trec07p.tgz'):
urllib.request.urlretrieve(spam_url,path)
tar_file=tarfile.open(path)
我不确定我错过了什么,但返回了以下错误:
I am not sure what I am missing but the following error is returned:
---------------------------------------------------------------------------
ReadError Traceback (most recent call last)
<ipython-input-21-5644813e0670> in <module>()
18 if not os.path.isfile('trec07p.tgz'):
19 urllib.request.urlretrieve(spam_url,path)
---> 20 tar_file=tarfile.open(path)
21 # tar_file.extractall(path)
22 # tar_file.close()
/anaconda/lib/python2.7/tarfile.pyc in open(cls, name, mode, fileobj, bufsize, **kwargs)
1678 fileobj.seek(saved_pos)
1679 continue
-> 1680 raise ReadError("file could not be opened successfully")
1681
1682 elif ":" in mode:
ReadError: file could not be opened successfully
预先感谢您的帮助!
推荐答案
您可以向 tarfile.open
.您需要将模式设置为 'r:gz'
.
tarfile.open(path, 'r:gz')
接受协议后的工作示例:
import tarfile
import requests
URL = 'https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/trec07p.tgz'
FILE = '/home/blake/Downloads/trec07p.tgz'
resp = requests.get(URL, stream=True)
resp.raise_for_status()
with open(FILE, 'wb') as out_file:
for line in resp.iter_content(chunk_size=1024*4, decode_unicode=False):
out_file.write(line)
f = tarfile.open(FILE, 'r:gz')
print(f.getnames())
f.close()
输出:
['trec07p/data/inmail.35059',
'trec07p/data/inmail.34430',
'trec07p/data/inmail.45722',
..
..]
这篇关于tarfile 无法打开 tgz的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!