问题描述
在我的方式来配置python的字符串方法,以便我可以使用最快的一个。我有这个代码来测试字符串连接的文件,StringIO,StringIO和普通的字符串。
原因是 StringIO 在幕后,它只是保留了所有已写入的字符串的列表,只有在必要时才将它们结合起来。所以写操作就像将一个对象附加到列表一样简单。然而, cStringIO 模块没有这个优点,必须将每个字符串的数据复制到它的缓冲区中,在必要时调整它的缓冲区大小数据写入大量数据时)。
由于你写了很多较大的字符串,这意味着 StringIO cStringIO 。当从你写入的 StringIO 对象读取数据时,它可以通过计算写入到它的字符串的长度之和来预先分配一个缓冲区不过, StringIO 并不是加入一系列字符串的最快方式。这是因为它提供了额外的功能(寻找缓冲区的不同部分并在那里写入数据)。如果不需要这个功能,所有你想要做的就是加入一个列表字符串,然后 str.join 是最快的方法。
joined_string =.join(testbuf索引范围(1000))
#或建立分开加入的字符串列表$ b $ (1000):
strings.append(testbuf)
joined_string =.join(strings)
In my way to profile string methods in python so that I can use the fastest one.I have this code to test string concatenation in files, StringIO, StringIO and normal string.
#!/usr/bin/env python #title : pythonTiming.py #description : Will be used to test timing function in python #author : myusuf #date : 19-11-2014 #version : 0 #usage :python pythonTiming.py #notes : #python_version :2.6.6 #============================================================================== import time import cStringIO import StringIO class Timer(object): def __enter__(self): self.start = time.time() return self def __exit__(self, *args): self.end = time.time() self.interval = self.end - self.start testbuf = """ Hello This is a General String that will be repreated This string will be written to a file , StringIO and a sregualr strin then see the best to handle string according to time """ * 1000 MyFile = open("./testfile.txt" ,"wb+") MyStr = '' MyStrIo = StringIO.StringIO() MycStrIo = cStringIO.StringIO() def strWithFiles(): global MyFile print "writing string to file " for index in range(1000): MyFile.write(testbuf) pass def strWithStringIO(): global MyStrIo print "writing string to StrinIO " for index in range(1000): MyStrIo.write(testbuf) def strWithStr(): global MyStr print "Writing String to STR " for index in range(500): MyStr = MyStr + testbuf def strWithCstr(): global MycStrIo print "writing String to Cstring" for index in range(1000): MycStrIo.write(testbuf) with Timer() as t: strWithFiles() print('##Request took %.03f sec.' % t.interval) with Timer() as t: strWithStringIO() print('###Request took %.03f sec.' % t.interval) with Timer() as t: strWithCstr() print('####Request took %.03f sec.' % t.interval) with Timer() as t: read1 = 'x' + MyFile.read(-1) print('file read ##Request took %.03f sec.' % t.interval) with Timer() as t: read2 = 'x' + MyStrIo.read(-1) print('stringIo read ###Request took %.03f sec.' % t.interval) with Timer() as t: read3 = 'x' + MycStrIo.read(-1) print('CString read ####Request took %.03f sec.' % t.interval) MyFile.close()
While the Python documentation site says that cStringIO is faster than StringIO but the results says that StringIO has better performance in concatenation, why?
The other hand is that, reading from cStringIO is faster than StringIO (its behavior similar to file), as I read the implementation of file and cStringIO are in C, so why string concatenation is slow?
Is there any other way to deal with string more faster than these methods?
The reason that StringIO performs better is behind the scenes it just keeps a list of all the strings that have been written to it, and only combines them when necessary. So a write operation is as simple as appending an object to a list. However, the cStringIO module does not have this luxury and must copy over the data of each string into its buffer, resizing its buffer as and when necessary (which creates much redundant copying of data when writing large amounts of data).
Since you are writing lots of larger strings, this means there is less work for StringIO to do in comparison to cStringIO. When reading from a StringIO object you have written to, it can optmise the amount of copying needed by computing the sum of the lengths of the strings written to it preallocating a buffer of that size.
However, StringIO is not the fastest way of joining a series of strings. This is because it provides additional functionality (seeking to different parts of the buffer and writing data there). If this functionality is not needed all you want to do is join a list strings together, then str.join is the fastest way to do this.
joined_string = "".join(testbuf for index in range(1000)) # or building the list of strings to join separately strings = [] for i in range(1000): strings.append(testbuf) joined_string = "".join(strings)
这篇关于Python的cStringIO写入时比StringIO占用更多的时间(字符串方法的性能)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!