问题描述
我在这篇文章中投票最多的答案中找到了一种在 Python 中进行流式阅读的方法.
I found a way to do streaming reading in Python in this post's most voted answer.
但是出错了,当我在读取块后做一些耗时的任务时,我只能获取部分前端数据.
But it went wrong that I could only get partial front data when I was doing some time-consuming task after the chunk had been read.
from urllib2 import urlopen
from urllib2 import HTTPError
import sys
import time
CHUNK = 1024 * 1024 * 16
try:
response = urlopen("XXX_domain/XXX_file_in_net.gz")
except HTTPError as e:
print e
sys.exit(1)
while True:
chunk = response.read(CHUNK)
print 'CHUNK:', len(chunk)
#some time-consuming work, just as example
time.sleep(60)
if not chunk:
break
如果没有睡眠,则输出正确(添加的总大小与实际大小一致):
If no sleep, the output is right(the total size added is verified to be same with the actual size ):
CHUNK: 16777216
CHUNK: 16777216
CHUNK: 6888014
CHUNK: 0
如果睡觉:
CHUNK: 16777216
CHUNK: 766580
CHUNK: 0
然后我解压了这些块,发现只读取了 gz 文件的前面部分内容.
And I decompressed these chunk and find only front partial content of the gz file had been read.
推荐答案
尝试支持断点恢复下载,以防服务器在发送所有足够数据之前关闭链接.
Try to support breakpoint-resuming-download in case the server closes the link before sending all enough data.
try:
request = Request(the_url, headers={'Range': 'bytes=0-'})
response = urlopen(request, timeout = 60)
except HTTPError as e:
print e
return 'Connection Error'
print dict(response.info())
header_dict = dict(response.info())
global content_size
if 'content-length' in header_dict:
content_size = int(header_dict['content-length'])
CHUNK = 16*1024 * 1024
while True:
while True:
try:
chunk = response.read(CHUNK )
except socket.timeout, e:
print 'time_out'
break
if not chunk:
break
DoSomeTimeConsumingJob()
global handled_size
handled_size = handled_size + len(chunk)
if handled_size == content_size and content_size != 0:
break
else:
try:
request = Request(the_url, headers={'Range': 'bytes='+ str(handled_size) + '-'})
response = urlopen(request, timeout = 60)
except HTTPError as e:
print e
response.close()
这篇关于使用 python urlib2.open 流式读取(chunk-by-chunk reading)只能得到部分结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!