如何最佳地计数一个python列表中的元素 | 如何最佳地计数一个python列表中的元素

本文介绍了如何最佳地计数一个python列表中的元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这几乎与这里

我有一个列表（在0到12之间大约有10个整数），例如：

  the_list = [5，7，6，5，5，4，4，7，5，4]

我想创建一个函数，返回由第一个元素排序的元组列表（项目，计数）例如

  output = [（4,3），（5,5），（6,1） 2）]

到目前为止，我使用了：

  def dupli（the_list）：
 return [（item，the_list.count（item） / code>

但是我调用这个函数几乎花了一个毫秒的时间，我需要使它像我（python）一样快。所以我的问题：如何使这个函数减少时间comsuming？（内存怎么样？）

 
 
 我玩过一会儿，但没有什么明显的：
 从timeit导入定时器为T 
 number = 10000 
 setup =the_list = [5，7，6，5，5，4，4，7 ，5，4]
 
 stmt =[item，the_list.count（item））for item in sorted（set（the_list））]
 T（stmt = stmt， setup = setup）.timeit（number = number）
 
 Out [230]：0.058799982070922852 
 
 stmt =L = []; \\\
for item in sorted the_list））：\\\
 L.append（（item，the_list.count（item）））
 T（stmt = stmt，setup = setup）.timeit（number = number）
 
 Out [233]：0.065041065216064453 
 
 stmt =[（item，the_list.count（item））set in item（sorted（the_list））]
 T（stmt = stmt，setup = setup）.timeit（number = number）
 
 Out [236]：0.098351955413818359 
  
 
 b $ b 
感谢
 
 Christophe 
解决方案
更改排序方式，节约大约20 ％。
 
 
 更改：
  def dupli 
 return [（item，the_list.count（item））for item in sorted（set（the_list））] 
  
 
 b $ b 
到这个：
  def dupli（the_list）：
 count = the_list.count＃this优化增加了Sven的注释
 result = [（item，count（item））for item in set（the_list）] 
 result.sort（）
返回结果
  
这更快的原因是 sorted 迭代器必须创建一个临时
 编辑： 
这是另一种方法，比原来快35％ 
  def dupli（the_list）：
 counts = [0，0，0，0，0，0，0 ，0，0，0，0，0，0] 
对于in_list中的n：
 counts [n] + = 1 
 return [ （0,1,2,3,4,5,6,7,8,9,10,11,12）如果count [i]] 
  
注意：您可能想随机化 the_list 的值。我的最终版本的 dupli 测试甚至更快与其他随机数据集（ import random; the_list = [random.randint（0,12）for i in xrange（10）] ）

This is almost the same question than here, except that I am asking about the most efficient solution for a sorted result.
I have a list (about 10 integers randomly between 0 and 12), for example:
the_list = [5, 7, 6, 5, 5, 4, 4, 7, 5, 4]
I want to create a function that returns a list of tuples (item, counts) ordered by the first element, for example
output = [(4, 3), (5, 4), (6, 1), (7, 2)]
So far I have used:
def dupli(the_list):
    return [(item, the_list.count(item)) for item in sorted(set(the_list))]
But I call this function almost a millon time and I need to make it as fast as I (python) can. Therefore my question: How to make this function less time comsuming? (what about memory?)
I have played around a bit, but nothing obvious came up:
from timeit import Timer as T
number=10000
setup = "the_list=[5, 7, 6, 5, 5, 4, 4, 7, 5, 4]"

stmt = "[(item, the_list.count(item)) for item in sorted(set(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[230]: 0.058799982070922852

stmt = "L = []; \nfor item in sorted(set(the_list)): \n    L.append((item, the_list.count(item)))"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[233]: 0.065041065216064453

stmt = "[(item, the_list.count(item)) for item in set(sorted(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[236]: 0.098351955413818359
Thanks
Christophe
 解决方案 
Change where you sort for a savings of about 20%.
Change this: 
def dupli(the_list):
    return [(item, the_list.count(item)) for item in sorted(set(the_list))]
To this:
def dupli(the_list):
    count = the_list.count # this optimization added courtesy of Sven's comment
    result = [(item, count(item)) for item in set(the_list)]
    result.sort()
    return result
The reason this is faster is that the sorted iterator must create a temporary list, whereas sorting the result sorts in place.
edit:Here's another approach that is 35% faster than your original:
def dupli(the_list):
    counts = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    for n in the_list:
        counts[n] += 1
    return [(i, counts[i]) for i in (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) if counts[i]]
Note:  You may want to randomize the values for the_list.  My final version of dupli tests even faster with other random data sets (import random; the_list=[random.randint(0,12) for i in xrange(10)])
                        这篇关于如何最佳地计数一个python列表中的元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！