在Python中是否没有对大型字符串的压缩支持？

startTime = time.clock（） fObj = file（r''c：\strSize500MB.dat''，''rb''） strSize500MB = fObj.read（） fObj.close（）打印打印''从文件加载未压缩的数据：％7.3f 秒''％（time.clock（） - startTime，） startTime = time.clock（） fObj = file（r''c：\strSize500MBCompressed.zlib''，''rb''） strSize500MBCompressed = fObj.read（） fObj .close（）打印打印''从文件加载压缩数据：％7.3f 秒''％（time.clock（）-startTime，） import zlib 试试： startTime = time.clock（） strSize500MB = zlib.decompress（strSize500MBCompressed）打印''解压缩zli b数据：％7.3f 秒''％（time.clock（） - startTime，）除了：打印''解压缩zlib数据失败'' startTime = time.clock（） fObj =文件（r''c：\strSize500MBCompressed.pylzma''，''rb'' ） strSize500MBCompressed = fObj.read（） fObj.close（）打印打印''loading来自文件的压缩数据：％7.3f 秒''％（time.clock（） - startTime，）导入pylzma 尝试： startTime = time.clock（） strSize500MB = pylzma.decompress（strSize500MBCompressed） print''解压缩pylzma数据：％7.3f 秒''％（time.clock（） - startTime，）除了：打印''解压缩pylzma数据失败'' startTime = time.clock（） fObj = file（r''c：\strSize500MBCompressed.bz2''，''rb''） strSize500MBCompressed = fObj.read（） fObj.close（）打印打印''从文件加载压缩数据：％7.3f 秒' '％（time.clock（） - startTime，）导入bz2 尝试： startTime = time.clock（） strSize500MB = bz2.decompress（strSize500MBCompressed）打印''解压缩bz2数据：％7.3f 秒''％（time.clock（） - startTime，）除外：打印''解压缩bz2数据失败'' raw_input（''EXIT with Enter / > ''） What started as a simple test if it is better to load uncompressed datadirectly from the harddisk orload compressed data and uncompress it (Windows XP SP 2, Pentium4 3.0 GHzsystem with 3 GByte RAM)seems to show that none of the in Python available compression librariesreally works for large sized(i.e. 500 MByte) strings. Test the provided code and see yourself. At least on my system:zlib fails to decompress raising a memory errorpylzma fails to decompress running endlessly consuming 99% of CPU timebz2 fails to compress running endlessly consuming 99% of CPU time The same works with a 10 MByte string without any problem. So what? Is there no compression support for large sized strings in Python?Am I doing something the wrong way here?Is there any and if yes, what is the theoretical upper limit of string sizewhich can be processed by each of the compression libraries? The only limit I know about is 2 GByte for the python.exe process itself,but this seems not to be the actual problem in this case.There are also some other strange effects when trying to create largestrings using following code:m = ''m''*1048576# str1024MB = 1024*m # fails with memory error, but:str512MB_01 = 512*m # works ok# str512MB_02 = 512*m # fails with memory error, but:str256MB_01 = 256*m # works okstr256MB_02 = 256*m # works oketc. . etc. and so ondown to allocation of each single MB in separate string to push python.exeto the experienced upper limitof memory reported by Windows task manager available to python.exe of2.065.352 KByte.Is the question why did the str1024MB = 1024*m instruction fail,when the memory is apparently there and the target size of 1 GByte can beachievedout of the scope of this discussion thread, or is this the same problemcausingthe compression libraries to fail? Why is no memory error raised then? Any hints towards understanding what is going on and why and/or towards aworkaround are welcome. Claudio ================================================== ==========# HDvsArchiveUnpackingSpeed_WriteFiles.py strSize10MB = ''1234567890''*1048576 # 10 MBstrSize500MB = 50*strSize10MBfObj = file(r''c:\strSize500MB.dat'', ''wb'')fObj.write(strSize500MB)fObj.close() fObj = file(r''c:\strSize500MBCompressed.zlib'', ''wb'')import zlibstrSize500MBCompressed = zlib.compress(strSize500MB)fObj.write(strSize500MBCompressed)fObj.close() fObj = file(r''c:\strSize500MBCompressed.pylzma'', ''wb'')import pylzmastrSize500MBCompressed = pylzma.compress(strSize500MB)fObj.write(strSize500MBCompressed)fObj.close() fObj = file(r''c:\strSize500MBCompressed.bz2'', ''wb'')import bz2strSize500MBCompressed = bz2.compress(strSize500MB)fObj.write(strSize500MBCompressed)fObj.close() printprint '' Created files: ''print '' %s \n %s \n %s \n %s'' %(r''c:\strSize500MB.dat'',r''c:\strSize500MBCompressed.zlib'',r''c:\strSize500MBCompressed.pylzma'',r''c:\strSize500MBCompressed.bz2'') raw_input('' EXIT with Enter /> '') ================================================== ==========# HDvsArchiveUnpackingSpeed_TestSpeed.pyimport time startTime = time.clock()fObj = file(r''c:\strSize500MB.dat'', ''rb'')strSize500MB = fObj.read()fObj.close()printprint '' loading uncompressed data from file: %7.3fseconds''%(time.clock()-startTime,) startTime = time.clock()fObj = file(r''c:\strSize500MBCompressed.zlib'', ''rb'')strSize500MBCompressed = fObj.read()fObj.close()printprint ''loading compressed data from file: %7.3fseconds''%(time.clock()-startTime,)import zlibtry:startTime = time.clock()strSize500MB = zlib.decompress(strSize500MBCompressed)print ''decompressing zlib data: %7.3fseconds''%(time.clock()-startTime,)except:print ''decompressing zlib data FAILED''startTime = time.clock()fObj = file(r''c:\strSize500MBCompressed.pylzma'', ''rb'')strSize500MBCompressed = fObj.read()fObj.close()printprint ''loading compressed data from file: %7.3fseconds''%(time.clock()-startTime,)import pylzmatry:startTime = time.clock()strSize500MB = pylzma.decompress(strSize500MBCompressed)print ''decompressing pylzma data: %7.3fseconds''%(time.clock()-startTime,)except:print ''decompressing pylzma data FAILED''startTime = time.clock()fObj = file(r''c:\strSize500MBCompressed.bz2'', ''rb'')strSize500MBCompressed = fObj.read()fObj.close()printprint ''loading compressed data from file: %7.3fseconds''%(time.clock()-startTime,)import bz2try:startTime = time.clock()strSize500MB = bz2.decompress(strSize500MBCompressed)print ''decompressing bz2 data: %7.3fseconds''%(time.clock()-startTime,)except:print ''decompressing bz2 data FAILED'' raw_input('' EXIT with Enter /> '')推荐答案 Claudio Grondi写道： Claudio Grondi wrote: 如果更好的话，什么开始作为一个简单的测试直接从硬盘加载未压缩数据或加载压缩数据并解压缩（Windows XP SP 2，Pentium4 3.0 GHz系统和3 GB内存）似乎表明Python中没有一个可用的压缩库真的适用于大型（即500 MByte）字符串。测试提供的代码并看看你自己。至少在我的系统上： zlib无法解压缩引发内存错误pylzma无法解压缩运行无休止地消耗99％的CPU时间 bz2无法压缩运行无休止地消耗99％的CPU时间同样的工作与10 MByte字符串没有任何问题。那又怎样？在Python中是否没有对大型字符串的压缩支持？ What started as a simple test if it is better to load uncompressed data directly from the harddisk or load compressed data and uncompress it (Windows XP SP 2, Pentium4 3.0 GHz system with 3 GByte RAM) seems to show that none of the in Python available compression libraries really works for large sized (i.e. 500 MByte) strings. Test the provided code and see yourself. At least on my system: zlib fails to decompress raising a memory error pylzma fails to decompress running endlessly consuming 99% of CPU time bz2 fails to compress running endlessly consuming 99% of CPU time The same works with a 10 MByte string without any problem. So what? Is there no compression support for large sized strings in Python? 你可能正在测量windows的内存管理而不是com / 压力库本身（Python委托所有内存分配> 256字节到系统）。我建议使用增量（流）处理;据我所知，所有三个图书馆都支持。 < / F> you''re probably measuring windows'' memory managment rather than the com-pression libraries themselves (Python delegates all memory allocations >256 bytesto the system). I suggest using incremental (streaming) processing instead; from what I can tell,all three libraries support that. </F> 在这个系统上（Linux 2.6.x，AMD64,2 GB RAM，python2.4）我能够通过重复构建一个1 GB的字符串，并压缩一个512MB 一串gzip字符串。 On this system (Linux 2.6.x, AMD64, 2 GB RAM, python2.4) I am able toconstruct a 1 GB string by repetition, as well as compress a 512MBstring with gzip in one gulp. cat claudio.py s ='' 1234567890''*（1048576 * 50） import zlib c = zlib.compress（s） print len（c ） open（" /tmp/claudio.gz" ;," wb"）。写（c） cat claudio.py s = ''1234567890''*(1048576*50) import zlibc = zlib.compress(s)print len(c)open("/tmp/claudio.gz", "wb").write(c) 这篇关于在Python中是否没有对大型字符串的压缩支持？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

fobj

在Python中是否没有对大型字符串的压缩支持？