问题描述
考虑从一个巨大的字符串中提取字母的问题.
Consider the problem of extracting alphabets from a huge string.
一种方法是
''.join([c for c in hugestring if c.isalpha()])
机制很明确:列表推导式生成字符列表.join 方法通过访问列表的长度来知道它需要连接多少个字符.
The mechanism is clear: The list comprehension generates a list of characters. The join method knows how many characters it needs to join by accessing the length of the list.
其他方法是
''.join(c for c in hugestring if c.isalpha())
这里的生成器理解导致生成器.join 方法不知道它要连接多少个字符,因为生成器没有 len 属性.所以这种join方式应该比list comprehension方法慢.
Here the generator comprehension results in a generator. The join method does not know how many characters it is going to join because the generator does not possess len attribute. So this way of joining should be slower than the list comprehension method.
但是在python中测试表明它并不慢.为什么会这样?谁能解释 join 在生成器上是如何工作的.
But testing in python shows that it is not slower. Why is this so?Can anyone explain how join works on a generator.
要清楚:
sum(j for j in range(100))
不需要知道 100,因为它可以跟踪累积和.它可以使用生成器上的 next 方法访问下一个元素,然后添加到累积总和中.然而,由于字符串是不可变的,累积地连接字符串会在每次迭代中创建一个新字符串.所以这需要很多时间.
doesn't need to have any knowledge of 100 because it can keep track of the cumulative sum. It can access the next element using the next method on the generator and then add to the cumulative sum.However, since strings are immutable, joining strings cumulatively would create a new string in each iteration. So this would take lot of time.
推荐答案
当你调用 str.join(gen)
其中 gen
是一个生成器时,Python 会做等效的事情list(gen)
在继续检查结果序列的长度之前.
When you call str.join(gen)
where gen
is a generator, Python does the equivalent of list(gen)
before going on to examine the length of the resulting sequence.
具体来说,如果你查看代码实现str.join
在 CPython 中,你会看到这个调用:
Specifically, if you look at the code implementing str.join
in CPython, you'll see this call:
fseq = PySequence_Fast(seq, "can only join an iterable");
对 PySequence_Fast
的调用会将 seq
参数转换为一个列表,如果它不是一个列表或元组.
The call to PySequence_Fast
converts the seq
argument into a list if it wasn't a list or tuple already.
因此,您呼叫的两个版本的处理方式几乎相同.在列表理解中,您自己构建列表并将其传递给 join
.在生成器表达式版本中,您传入的生成器对象会在 join
开始时变成一个 list
,其余代码对两个版本的操作相同..
So, the two versions of your call are handled almost identically. In the list comprehension, you're building the list yourself and passing it into join
. In the generator expression version, the generator object you pass in gets turned into a list
right at the start of join
, and the rest of the code operates the same for both versions..
这篇关于连接字符串.生成器或列表理解?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!