本文介绍了在保留订单的同时,如何从列表中删除重复项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有内置功能可以删除Python中的列表中的重复项,同时保留顺序?我知道我可以使用一个集合来删除重复的内容,但会破坏原始的顺序。我也知道我可以这样滚动自己:

  def uniq(input):
output = []
for x in input:
如果x不在输出中:
output.append(x)
返回输出

(感谢 。)



但是,如果可能,我想利用一个内置的或更多的Pythonic成语。



相关问题:



最快的一个:

  def f7(seq):
seen = set()
seen_add = seen.add
return [x for seq如果没有(x in seen或seen_add(x))]

为什么分配 seen.add to seen_add 而不是调用 seen.add ? Python是一种动态语言,解决 seen.add 每次迭代都比解决局部变量更昂贵。 seen.add 可能会在迭代之间发生变化,并且运行时不够聪明,不能排除。为了安全起见,它必须每次检查对象。



如果您打算在同一数据集上使用此功能很多,或许你会更好有序集:
($)

Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(Thanks to unwind for that code sample.)

But I'd like to avail myself of a built-in or a more Pythonic idiom if possible.

Related question: In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

解决方案

Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark

Fastest one:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) insertion, deletion and member-check per operation.

这篇关于在保留订单的同时,如何从列表中删除重复项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 14:40