Test the provided code and see yourself. At least on my system:zlib fails to decompress raising a memory errorpylzma fails to decompress running endlessly consuming 99% of CPU timebz2 fails to compress running endlessly consuming 99% of CPU time The same works with a 10 MByte string without any problem. So what? Is there no compression support for large sized strings in Python?Am I doing something the wrong way here?Is there any and if yes, what is the theoretical upper limit of string sizewhich can be processed by each of the compression libraries? The only limit I know about is 2 GByte for the python.exe process itself,but this seems not to be the actual problem in this case.There are also some other strange effects when trying to create largestrings using following code:m = ''m''*1048576# str1024MB = 1024*m # fails with memory error, but:str512MB_01 = 512*m # works ok# str512MB_02 = 512*m # fails with memory error, but:str256MB_01 = 256*m # works okstr256MB_02 = 256*m # works oketc. . etc. and so ondown to allocation of each single MB in separate string to push python.exeto the experienced upper limitof memory reported by Windows task manager available to python.exe of2.065.352 KByte.Is the question why did the str1024MB = 1024*m instruction fail,when the memory is apparently there and the target size of 1 GByte can beachievedout of the scope of this discussion thread, or is this the same problemcausingthe compression libraries to fail? Why is no memory error raised then? Any hints towards understanding what is going on and why and/or towards aworkaround are welcome. Claudio ================================================== ==========# strSize10MB = ''1234567890''*1048576 # 10 MBstrSize500MB = 50*strSize10MBfObj = file(r''c:\strSize500MB.dat'', ''wb'')fObj.write(strSize500MB)fObj.close() fObj = file(r''c:\strSize500MBCompressed.zlib'', ''wb'')import zlibstrSize500MBCompressed = zlib.compress(strSize500MB)fObj.write(strSize500MBCompressed)fObj.close() fObj = file(r''c:\strSize500MBCompressed.pylzma'', ''wb'')import pylzmastrSize500MBCompressed = pylzma.compress(strSize500MB)fObj.write(strSize500MBCompressed)fObj.close() fObj = file(r''c:\strSize500MBCompressed.bz2'', ''wb'')import bz2strSize500MBCompressed = bz2.compress(strSize500MB)fObj.write(strSize500MBCompressed)fObj.close() printprint '' Created files: ''print '' %s \n %s \n %s \n %s'' %(r''c:\strSize500MB.dat'',r''c:\strSize500MBCompressed.zlib'',r''c:\strSize500MBCompressed.pylzma'',r''c:\strSize500MBCompressed.bz2'') raw_input('' EXIT with Enter /> '') ================================================== ==========# HDvsArchiveUnpackingSpeed_TestSpeed.pyimport time startTime = time.clock()fObj = file(r''c:\strSize500MB.dat'', ''rb'')strSize500MB = '' loading uncompressed data from file: %7.3fseconds''%(time.clock()-startTime,) startTime = time.clock()fObj = file(r''c:\strSize500MBCompressed.zlib'', ''rb'')strSize500MBCompressed = ''loading compressed data from file: %7.3fseconds''%(time.clock()-startTime,)import zlibtry:startTime = time.clock()strSize500MB = zlib.decompress(strSize500MBCompressed)print ''decompressing zlib data: %7.3fseconds''%(time.clock()-startTime,)except:print ''decompressing zlib data FAILED''startTime = time.clock()fObj = file(r''c:\strSize500MBCompressed.pylzma'', ''rb'')strSize500MBCompressed = ''loading compressed data from file: %7.3fseconds''%(time.clock()-startTime,)import pylzmatry:startTime = time.clock()strSize500MB = pylzma.decompress(strSize500MBCompressed)print ''decompressing pylzma data: %7.3fseconds''%(time.clock()-startTime,)except:print ''decompressing pylzma data FAILED''startTime = time.clock()fObj = file(r''c:\strSize500MBCompressed.bz2'', ''rb'')strSize500MBCompressed = ''loading compressed data from file: %7.3fseconds''%(time.clock()-startTime,)import bz2try:startTime = time.clock()strSize500MB = bz2.decompress(strSize500MBCompressed)print ''decompressing bz2 data: %7.3fseconds''%(time.clock()-startTime,)except:print ''decompressing bz2 data FAILED'' raw_input('' EXIT with Enter /> '')推荐答案 Claudio Grondi写道: Claudio Grondi wrote: 如果更好的话,什么开始作为一个简单的测试直接从硬盘加载未压缩数据或加载压缩数据并解压缩(Windows XP SP 2,Pentium4 3.0 GHz系统和3 GB内存)似乎表明Python中没有一个可用的压缩库真的适用于大型(即500 MByte)字符串。 测试提供的代码并看看你自己。 至少在我的系统上: zlib无法解压缩引发内存错误pylzma无法解压缩运行无休止地消耗99%的CPU时间 bz2无法压缩运行无休止地消耗99%的CPU时间 同样的工作与10 MByte字符串没有任何问题。 那又怎样?在Python中是否没有对大型字符串的压缩支持? you''re probably measuring windows'' memory managment rather than the com-pression libraries themselves (Python delegates all memory allocations >256 bytesto the system).

I suggest using incremental (streaming) processing instead; from what I can tell,all three libraries support that.

</F> On this system (Linux 2.6.x, AMD64, 2 GB RAM, python2.4) I am able toconstruct a 1 GB string by repetition, as well as compress a 512MBstring with gzip in one gulp.

cat
s ='' 1234567890''*(1048576 * 50)
import zlib
c = zlib.compress(s)
print len(c )
open(" /tmp/claudio.gz" ;," wb")。写(c)
cat
s = ''1234567890''*(1048576*50)
import zlibc = zlib.compress(s)print len(c)open("/tmp/claudio.gz", "wb").write(c)
