问题描述
其中一张海报激励我对我的新手剧本进行分析
(粘贴在下面)。经过测量后,我发现Python的速度至少在我的脚本工作的区域,令人惊讶的是
高。
这是实验:脚本在其他地方重新创建文件夹层次结构
并在那里存储来自源层次结构的压缩版本的
文件(该脚本正在执行其他备份
我公司服务器上的文件服务器磁盘的
,用于其他
磁盘,为节省空间而进行压缩)。数据是:
468 MB,15057个文件,1568个文件夹
(机器:win2k,python v2.3.3)
WinRAR v3.20(设置ZIP格式和正常压缩
设置)所需的压缩时间为119秒。
Python脚本时间(在profiler下运行)是鼓号......
198秒。
请注意,Python脚本必须费力地重新创建
树的1568个文件夹并创建超过15000个压缩文件,所以
它实际上比WinRAR做的更多。
压缩数据的大小基本相同,大约207 MB。
我觉得在现实世界的应用领域非常鼓舞人们>
a使用非常高级语言编写的新手脚本可以具有与shrinkwrap性能相差不远的
性能
pro archiver (WinRAR是优秀的归档器,无论是在压缩还是速度方面都是如此)。我确实意识到这主要是所有底层基础设施的结果。 Python很棒
工作,伙计们。恭喜。
我在这张照片中唯一缺少的是知识,如果我的脚本
可以进一步优化(不是我真的需要更好
性能,我只是好奇可能的解决方案是什么。)
经验丰富的人之间的任何接受者?
性能分析结果:
One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.
This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was:
468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)
The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.
The Python script time (running under profiler) was, drumroll...
198 seconds.
Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.
I find it very encouraging that in the real world area of application
a newbie script written in the very high-level language can have the
performance that is not that far from the performance of "shrinkwrap"
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the "underlying infrastructure" of Python. Great
work, guys. Congrats.
The only thing I''m missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I''m just curious what possible solutions could be).
Any takers among the experienced guys?
Profiling results:
Fri Dec 31 01:04:14 2004 p3.tmp
580543函数调用(568607个原始调用) )在198.124 CPU
秒
订购者:累积时间
由于限制,列表从69减少到40< 40> ;
ncalls tottime percall cumtime percall
文件名:lineno(函数)
1 0.013 0.013 198.124 198.124 profile:0(z3 ())
1 0.000 0.000 19 8.110 198.110< string>:1(?)
1 0.000 0.000 198.110 198.110< interactive
输入>:1(z3)
1 1.513 1.513 198.110 198.110 zmtree3.py:26(zmtree)
15057 14.504 0.001 186.961 0.012 zmtree3.py:7(zf)
15057 147.582 0.010 148.778 0.010
C:\ Piy23 \ lib \ zipfile.py:388(写)
15057 12.156 0.001 12.156 0.001
C:\Python23 \\ \\ lib\zipfile.py:182(__ init__)
32002 7.957 0.000 8.542 0.000
C:\ PYTHON23 \Lib \ antpath.py:266(isdir )
13826/1890 2.550 0.000 8.143 0.004
C:\Python23 \lib \ os.py:206(步行)
30114 3.164 0.000 3.164 0.000
C:\Python23 \lib \ zipfile.py:483(关闭)
60228 1.753 0.000 2.149 0.000
C:\ PYTHON23 \Lib\\\
tpath.py:157(拆分)
45171 0.538 0 .000 2.116 0.000
C:\ PYTHON23 \Lib\\\
tpath.py:197(basename)
15057 1.285 0.000 1.917 0.000
C:\ PYTHON23 \Lib\\\
tpath.py:467(绝对路径)
33890 0.688 0.000 1.419 0.000
C:\ PYTHON23 \Lib \ ntpath.py:58(join)
109175 0.783 0.000 0.783 0.000
C:\ PYTHON23 \Lib \ antpath.py:115(splitdrive)
15057 0.196 0.000 0.768 0.000
C:\ PYTHON23 \Lib\\\
tpath.py:204(dirname)
33890 0.433 0.000 0.731 0.000
C:\ PYTHON23 \Lib\\\
tpath.py:50(isabs)
15057 0.544 0.000 0.632 0.000
C:\ PYTHON23 \Lib\\\
tpath.py:438(normpath)
32002 0.431 0.000 0.585 0.000
C:\ PYTHON23 \Lib\stat.py:45 (S_ISDIR)
15057 0.555 0.000 0.555 0.000
C:\Python23 \ lib \\ zipfile.py:149(FileHeader)
15057 0.483 0.000 0.483 0.000
C:\Python23 \lib \ zipfile.py:116(__ init __)
151 0.002 0.000 0.435 0.003
C:\ PYTHON23 \lib \site-packages \Pythonwin \ pywin \ framework \ winout.py:171(w rite)
151 0.002 0.000 0.432 0.003
C:\ PYTHON23 \lib\site-packages \Pythonwin \ pywin \ framework \ winout.py :489(wite)
151 0.013 0.000 0.430 0.003
C:\ PYTHON23 \lib\site-packages \Pythonwin \ pywin \ framework \\ \\ winout.py:461(H andleOutput)
76 0.087 0.001 0.405 0.005
C:\ PYTHON23 \lib\site-packages \Pythonwin \ pywin \ framework\winout.py:430(Q ueueFlush)
15057 0.239 0.000 0.340 0.000
C:\Python23 \ lib \ zipfile.py:479 (__ del__)
150 57 0.157 0.000 0.157 0.000
C:\Python23 \lib \ zipfile.py:371(_writecheck)
32002 0.154 0.000 0.154 0.000
C:\ PYTHON23 \Lib\stat.py:29(S_IFMT)
76 0.007 0.000 0.146 0.002
C:\ PYTHON23 \lib \\ _site-packages \Pythonwin \ pywin \ framework \ winout.py:262(d owrite)
76 0.007 0.000 0.137 0.002
C:\ PYTHON23 \lib\site-packages \Pythonwin \ pywin \ scintilla \ formatter.py:22 1(OnStyleNeeded)
76 0.011 0.000 0.118 0.002
C :\ PYTHON23 \lib \site-packages \Pythonwin \ pywin \ framework \interact.py:197(着色)
76 0.110 0.001 0.112 0.001
C:\PYTHON23 \lib \site-packages \Pythonwin \ pywin \ scintilla \ control.py:69(S CIInsertText)
76 0.079 0.001 0.081 0.001
C:\ PYTHON23 \lib \site-packages \Pythonwin \ pywin \ scintilla \ control.py:333(GetTextRange)
76 0.018 0.000 0.020 0.000
C:\ PYTHON23 \lib \site-packages \Pythonwin \ pywin \ scintilla \ control.py:296(SetSel)
76 0.006 0.000 0.018 0.000
C:\ PYTHON23 \lib\site-packages \Pythonwin \ pywin \ scintilla \ document.py:149(__ call__)
227 0.003 0.000 0.012 0.000
C:\Python23 \lib\Queue.py:172(get_nowait)
76 0.007 0.000 0.011 0.000
C:\ PYTHON23 \lib \site-packages \Pythonwin \ pywin \ framework \interact.py:114(ColorizeInteractiveCode)
532 0.011 0.000 0.011 0.000
C:\ PYTHON23 \lib \site-packages \Pythonwin \ pywin \ scintilla \ control.py:330(GetTextLength)
76 0.001 0.000 0.010 0.000
C:\ PYTHON23 \lib\site-packages \Pythonwin \ pywin \ scintilla \ view.py:256(OnB raceMatch)
1888 0.009 0.000 0.009 0.000
C:\ PYTHON23 \Lib\\\
tpath.py:245(islink)
---
脚本:
#!/ usr / bin / python
import os
import sys ZipFile,ZIP_DEFLATED
def zf(sfpath,targetdir):
if(sys.platform [:3] ==''赢''):
tgfpath = sfpath [2:]
else:
tgfpath = sfpath
zfdir = os.path.dirname(os.path.abspath(targetdir)+ tgfpath)
zfpath = zfdir + os.path.sep + os.path.basename(tgfpath)+' '.zip''
if(not os.path.isdir(zfdir)):
os.makedirs(zfdir)
archive = ZipFile(zfpath,''w'',ZIP_DEFLATED)
sfile = open(sfpath,''rb'')
zfname = os.path.basename(tg fpath)
archive.write(sfpath,os.path.basename(zfpath),ZIP_DEFLATED)
archive.close()
ssize = os.stat(sfpath).st_size
zsize = os.stat(zfpath).st_size
return(ssize,zsize)
def zmtree (sdir,tdir):
n = 0
ssize = 0
zsize = 0
sys.stdout。写(''\ n'')
for root,dirs,os.walk中的文件(sdir):
for file in files:
res = zf(os.path.join(root,file),tdir)
ssize + = res [0]
zsize + = res [1]
n = n + 1
#sys.stdout.write(''。'')
if(n%200 == 0):
print" %。2fM(%。2fM)" %(ssize / 1048576.0,
zsize / 1048576.0)
#sys.stdout.write('''')
return(n, ssize,zsize)
if __name __ ==" __ main __":
if len(sys.argv)== 3:
if( os.path.isdir(sys.argv [1])和os.path.isdir(sys.argv [2])):
(n,ssize,zsize)= zmtree (os.path.abspath(sys.argv [1]),os.path.abspath(sys.argv [2]))
print" \ n \ n摘要:\ n压缩的文件数:%d \ n
原始文件的总大小:%。2fM \ n \
压缩文件的总大小:%。2fM" ; %(n,ssize / 1048576.0,
zsize / 1048576.0)
sys.exit(0)
else:
print"不正确的参数。"
if(not os.path.isdir(sys.argv [1])):print sys.argv [1] +"
不是目录。
if(not os.path.isdir(sys.argv [2])):print sys.argv [2] +"
不是目录。
print" \ n用法:\ n" + sys.argv [0] +"源目录
目标目录"
-
这是人类在Python编程中的生活协会。
Fri Dec 31 01:04:14 2004 p3.tmp
580543 function calls (568607 primitive calls) in 198.124 CPU
seconds
Ordered by: cumulative time
List reduced from 69 to 40 due to restriction <40>
ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.013 0.013 198.124 198.124 profile:0(z3())
1 0.000 0.000 198.110 198.110 <string>:1(?)
1 0.000 0.000 198.110 198.110 <interactive
input>:1(z3)
1 1.513 1.513 198.110 198.110 zmtree3.py:26(zmtree)
15057 14.504 0.001 186.961 0.012 zmtree3.py:7(zf)
15057 147.582 0.010 148.778 0.010
C:\Python23\lib\zipfile.py:388(write)
15057 12.156 0.001 12.156 0.001
C:\Python23\lib\zipfile.py:182(__init__)
32002 7.957 0.000 8.542 0.000
C:\PYTHON23\Lib\ntpath.py:266(isdir)
13826/1890 2.550 0.000 8.143 0.004
C:\Python23\lib\os.py:206(walk)
30114 3.164 0.000 3.164 0.000
C:\Python23\lib\zipfile.py:483(close)
60228 1.753 0.000 2.149 0.000
C:\PYTHON23\Lib\ntpath.py:157(split)
45171 0.538 0.000 2.116 0.000
C:\PYTHON23\Lib\ntpath.py:197(basename)
15057 1.285 0.000 1.917 0.000
C:\PYTHON23\Lib\ntpath.py:467(abspath)
33890 0.688 0.000 1.419 0.000
C:\PYTHON23\Lib\ntpath.py:58(join)
109175 0.783 0.000 0.783 0.000
C:\PYTHON23\Lib\ntpath.py:115(splitdrive)
15057 0.196 0.000 0.768 0.000
C:\PYTHON23\Lib\ntpath.py:204(dirname)
33890 0.433 0.000 0.731 0.000
C:\PYTHON23\Lib\ntpath.py:50(isabs)
15057 0.544 0.000 0.632 0.000
C:\PYTHON23\Lib\ntpath.py:438(normpath)
32002 0.431 0.000 0.585 0.000
C:\PYTHON23\Lib\stat.py:45(S_ISDIR)
15057 0.555 0.000 0.555 0.000
C:\Python23\lib\zipfile.py:149(FileHeader)
15057 0.483 0.000 0.483 0.000
C:\Python23\lib\zipfile.py:116(__init__)
151 0.002 0.000 0.435 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:171(w rite)
151 0.002 0.000 0.432 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:489(w rite)
151 0.013 0.000 0.430 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:461(H andleOutput)
76 0.087 0.001 0.405 0.005
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:430(Q ueueFlush)
15057 0.239 0.000 0.340 0.000
C:\Python23\lib\zipfile.py:479(__del__)
15057 0.157 0.000 0.157 0.000
C:\Python23\lib\zipfile.py:371(_writecheck)
32002 0.154 0.000 0.154 0.000
C:\PYTHON23\Lib\stat.py:29(S_IFMT)
76 0.007 0.000 0.146 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:262(d owrite)
76 0.007 0.000 0.137 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\formatter.py:22 1(OnStyleNeeded)
76 0.011 0.000 0.118 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:197 (Colorize)
76 0.110 0.001 0.112 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:69(S CIInsertText)
76 0.079 0.001 0.081 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:333( GetTextRange)
76 0.018 0.000 0.020 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:296( SetSel)
76 0.006 0.000 0.018 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\document.py:149 (__call__)
227 0.003 0.000 0.012 0.000
C:\Python23\lib\Queue.py:172(get_nowait)
76 0.007 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:114 (ColorizeInteractiveCode)
532 0.011 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:330( GetTextLength)
76 0.001 0.000 0.010 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\view.py:256(OnB raceMatch)
1888 0.009 0.000 0.009 0.000
C:\PYTHON23\Lib\ntpath.py:245(islink)
---
Script:
#!/usr/bin/python
import os
import sys
from zipfile import ZipFile, ZIP_DEFLATED
def zf(sfpath, targetdir):
if (sys.platform[:3] == ''win''):
tgfpath=sfpath[2:]
else:
tgfpath=sfpath
zfdir=os.path.dirname(os.path.abspath(targetdir) + tgfpath)
zfpath=zfdir + os.path.sep + os.path.basename(tgfpath) + ''.zip''
if(not os.path.isdir(zfdir)):
os.makedirs(zfdir)
archive=ZipFile(zfpath, ''w'', ZIP_DEFLATED)
sfile=open(sfpath,''rb'')
zfname=os.path.basename(tgfpath)
archive.write(sfpath, os.path.basename(zfpath), ZIP_DEFLATED)
archive.close()
ssize=os.stat(sfpath).st_size
zsize=os.stat(zfpath).st_size
return (ssize,zsize)
def zmtree(sdir,tdir):
n=0
ssize=0
zsize=0
sys.stdout.write(''\n '')
for root, dirs, files in os.walk(sdir):
for file in files:
res=zf(os.path.join(root,file),tdir)
ssize+=res[0]
zsize+=res[1]
n=n+1
#sys.stdout.write(''.'')
if (n % 200 == 0):
print " %.2fM (%.2fM)" % (ssize/1048576.0,
zsize/1048576.0)
#sys.stdout.write('' '')
return (n, ssize, zsize)
if __name__=="__main__":
if len(sys.argv) == 3:
if(os.path.isdir(sys.argv[1]) and os.path.isdir(sys.argv[2])):
(n,ssize,zsize)=zmtree(os.path.abspath(sys.argv[1]),os.path.abspath(sys.argv[2]))
print "\n\n Summary:\n Number of files compressed: %d\n
Total size of original files: %.2fM\n \
Total size of compressed files: %.2fM" % (n, ssize/1048576.0,
zsize/1048576.0)
sys.exit(0)
else:
print "Incorrect arguments."
if (not os.path.isdir(sys.argv[1])): print sys.argv[1] + "
is not directory."
if (not os.path.isdir(sys.argv[2])): print sys.argv[2] + "
is not directory."
print "\n Usage:\n " + sys.argv[0] + " source-directory
target-directory"
--
It''s a man''s life in a Python Programming Association.
推荐答案
我没有研究你的脚本但是它很可能是磁盘绑定的。
这意味着磁盘访问时间非常大,几乎所有其他东西都完全淹没了。
我会指出其他一些想法,尽管你可能知道
他们:单独压缩所有文件,如果它们很小,可能会大大减少最终压缩,因为文件之间的相似性不能
被剥削。你可能不在乎。而且,拉链是指拉链。格式可以在
逐个文件的基础上更新;它可以完成所有你想要做的事情,只用一个命令行就可以获得
。只是一个想法。
I did not study your script but odds are it is strongly disk bound.
This means that the disk access time is so large that it completely swamps
almost everything else.
I would point out a couple of other ideas, though you may be aware of
them: Compressing all the files seperately, if they are small, may greatly
reduce the final compression since similarities between the files can not
be exploited. You may not care. Also, the "zip" format can be updated on a
file-by-file basis; it may do all by itself what you are trying to do,
with just a single command line. Just a thought.
True;但是,我的理解是压缩单个文件
也意味着在存档损坏的情况下,可以在损坏的文件后恢复文件
。当存档作为单个流压缩时,无法保证这一点。
-
Craig Ringer
True; however, it''s my understanding that compressing individual files
also means that in the case of damage to the archive it is possible to
recover the files after the damaged file. This cannot be guaranteed when
the archive is compressed as a single stream.
--
Craig Ringer
是的;但是,我的理解是压缩单个文件
也意味着在存档损坏的情况下,可以在损坏的文件之后恢复文件。将存档压缩为单个流时无法保证这一点。
True; however, it''s my understanding that compressing individual files
also means that in the case of damage to the archive it is possible to
recover the files after the damaged file. This cannot be guaranteed when
the archive is compressed as a single stream.
使用gzip,您可以忘记整个流的其余部分;使用bzip2,
很有可能只丢失一个块(100-900k)。
问候,
Reinhold
With gzip, you can forget the entire rest of the stream; with bzip2,
there is a good chance that nothing more than one block (100-900k) is lost.
regards,
Reinhold
这篇关于速度还不错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!