问题描述
由于Python的string
无法更改,我想知道如何更有效地连接字符串?
Since Python's string
can't be changed, I was wondering how to concatenate a string more efficiently?
我可以这样写:
s += stringfromelsewhere
或者像这样:
s = []
s.append(somestring)
# later
s = ''.join(s)
在写这个问题的时候,我发现了一篇关于这个话题的好文章.
While writing this question, I found a good article talking about the topic.
http://www.skymind.com/~ocrow/python_string/
但它在 Python 2.x. 中,所以问题是 Python 3 中是否有一些变化?
But it's in Python 2.x., so the question would be did something change in Python 3?
推荐答案
将字符串附加到字符串变量的最佳方式是使用 +
或 +=
.这是因为它具有可读性和快速性.它们也一样快,你选择哪一个是品味问题,后者是最常见的.以下是 timeit
模块的计时:
The best way of appending a string to a string variable is to use +
or +=
. This is because it's readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit
module:
a = a + b:
0.11338996887207031
a += b:
0.11040496826171875
然而,那些建议使用列表并附加到它们然后加入这些列表的人这样做是因为与扩展字符串相比,将字符串附加到列表可能非常快.在某些情况下,这可能是真的.例如,这里是一个百万次追加一个单字符的字符串,首先是一个字符串,然后是一个列表:
However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is onemillion appends of a one-character string, first to a string, then to a list:
a += b:
0.10780501365661621
a.append(b):
0.1123361587524414
好的,事实证明,即使结果字符串长度为一百万个字符,追加速度仍然更快.
OK, turns out that even when the resulting string is a million characters long, appending was still faster.
现在让我们尝试将一千个字符长的字符串附加十万次:
Now let's try with appending a thousand character long string a hundred thousand times:
a += b:
0.41823482513427734
a.append(b):
0.010656118392944336
因此,结束字符串的长度约为 100MB.那很慢,附加到列表要快得多.该时间不包括最终的 a.join().那么这需要多长时间?
The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn't include the final a.join()
. So how long would that take?
a.join(a):
0.43739795684814453
糟糕.事实证明,即使在这种情况下,追加/加入也较慢.
Oups. Turns out even in this case, append/join is slower.
那么这个推荐是从哪里来的呢?Python 2?
So where does this recommendation come from? Python 2?
a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474
好吧,如果您使用极长的字符串(通常不会,那么内存中 100MB 的字符串是什么?)
Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren't, what would you have a string that's 100MB in memory?)
但真正的关键是 Python 2.3.我什至不会向你展示时间,因为它太慢了还没有完成.这些测试突然需要分钟.除了 append/join,它和后来的 Python 一样快.
But the real clincher is Python 2.3. Where I won't even show you the timings, because it's so slow that it hasn't finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.
是的.在石器时代,Python 中的字符串连接非常缓慢.但是在 2.4 上它不再是(或者至少是 Python 2.4.7),所以使用 append/join 的建议在 2008 年已经过时了,当 Python 2.3 停止更新时,你应该停止使用它.:-)
Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn't anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)
(更新:当我更仔细地进行测试时发现,使用 +
和 +=
对于 Python 2.3 上的两个字符串也更快.推荐使用 ''.join()
一定是个误会)
(Update: Turns out when I did the testing more carefully that using +
and +=
is faster for two strings on Python 2.3 as well. The recommendation to use ''.join()
must be a misunderstanding)
然而,这是 CPython.其他实现可能有其他问题.这只是过早优化是万恶之源的另一个原因.除非您先进行测量,否则不要使用所谓更快"的技术.
However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don't use a technique that's supposed "faster" unless you first measure it.
因此进行字符串连接的最佳"版本是使用 + 或 +=.如果结果证明这对您来说很慢(这不太可能),那么请做其他事情.
Therefore the "best" version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.
那么为什么我在我的代码中使用了大量的追加/连接?因为有时它实际上更清楚.特别是当你应该连接在一起的任何东西都应该用空格或逗号或换行符分隔时.
So why do I use a lot of append/join in my code? Because sometimes it's actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.
这篇关于在 Python 中连接字符串的首选方法是哪种?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!