问题描述
我想使用生成器来优化我的应用程序,而不是创建3个列表,我想使用2个生成器.这是当前版本的我的应用的简短方案:
I want to optimize my application using generators and instead of creating 3 lists I want to use 2 generators. Here's the short scheme of my app in it's current version:
1)从二进制文件->第一个列表中加载数据
1) Load data from a binary file -> 1st list
self.stream_data = [ struct.unpack(">H", data_file.read(2))[0] for foo in
xrange(self.columns*self.rows) ]
2)创建所谓的非零抑制数据(所有数据均为零)->第二个列表
2) Create so called Nonzero-suppressed-data (all data with zeros) -> 2nd list
self.NZS_data = list()
for row in xrange(self.rows):
self.NZS_data.append( [ self.stream_data[column + row * self.rows ]
for column in xrange(self.columns) ] )
3)创建零抑制数据(坐标中不包含零)->第三个列表
3) Create Zero-suppressed-data (without zeros with coordinates) -> 3rd list
self.ZS_data = list()
for row in xrange(self.rows):
for column in xrange(self.columns):
if self.NZS_data[row][column]:
self.ZS_data.append( [ column, row, self.NZS_data[row][column] ] )
(我知道可以使用itertools.product将其压缩到单个列表理解中)
(I know that this could have been squeezed into a single list comprehension using itertools.product)
4)将ZS_data列表保存到文件中.
4) Save the ZS_data list into a file.
我使用了Python的cProfiler,大部分时间(除了读取和解压缩)都花在了创建这两个列表(NZS_data和ZS_data)上.因为我只需要它们就可以将数据保存到文件中,所以我一直在考虑使用2个生成器:
I used Python's cProfiler and most of the time (apart from reading and unpacking) is consumed for creation of these two (NZS_data and ZS_data) lists. Because I only need them for saving data into a file I've been thinking about using 2 generators:
1)创建一个用于读取文件的生成器->第一个生成器
1) Create a generator for reading a file -> 1st generator
self.stream_data = ( struct.unpack(">H", data_file.read(2))[0] for foo in
xrange(self.columns*self.rows) )
2)创建ZS_data生成器(我真的不需要此NZS数据)
2) Create ZS_data generator (I don't really need this NZS data)
self.ZS_data = ( [column, row, self.stream_data.next()]
for row, column in itertools.product(xrange(self.rows),
xrange(self.columns))
if self.stream_data.next() )
这当然不能正常工作,因为我从生成器中获得了两个不同的值.
3)使用生成器将数据保存到文件中.
3) Save data into a file using generator.
我想知道如何做到这一点.也许您还有其他与此应用程序可能的优化有关的想法?
I wonder how this could be done.Maybe you have other ideas related to possible optimization of this application?
添加
基于生成器的解决方案:
def create_ZS_data(self):
self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]]
for row, column in itertools.product(xrange(self.rows), xrange(self.columns))
if self.stream_data[column + row * self.rows ] )
配置文件信息:
ncalls tottime percall cumtime percall filename:lineno(function)
3257 1.117 0.000 71.598 0.022 decode_from_merlin.py:302(create_ZS_file)
463419 67.705 0.000 67.705 0.000 decode_from_merlin.py:86(<genexpr>)
乔恩的解决方案:
create_ZS_data(self):
self.ZS_data = list()
for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)):
for colno, col in enumerate(cols):
# col == value, (rowno, colno) = index
if col:
self.ZS_data.append([colno, rowno, col])
探查器信息:
Profiler info:
ncalls tottime percall cumtime percall filename:lineno(function)
3257 18.616 0.006 19.919 0.006 decode_from_merlin.py:83(create_ZS_data)
推荐答案
您可以使解压缩更加有效...
You could possibly make the unpacking more efficient...
self.data_stream = struct.unpack_from('>{}H'.format(self.rows*self.columns), data_file)
将循环减少到类似以下内容:
The reduce the looping to something like:
for rowno, cols in enumerate(self.data_stream[i:i+self.columns] for i in xrange(0, len(self.data_stream), self.columns)):
for colno, col in enumerate(cols):
# col == value, (rowno, colno) = index
if col == 0:
pass # do something
else:
pass # do something else
注释-未经测试
这篇关于用2个生成器替换3个列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!