问题描述
这几乎与 这里
我有一个列表(在0到12之间大约有10个整数),例如:
the_list = [5,7,6,5,5,4,4,7,5,4]
我想创建一个函数,返回由第一个元素排序的元组列表(项目,计数)例如
output = [(4,3),(5,5),(6,1) 2)]
到目前为止,我使用了:
def dupli(the_list):
return [(item,the_list.count(item) / code>
但是我调用这个函数几乎花了一个毫秒的时间,我需要使它像我(python)一样快。所以我的问题:如何使这个函数减少时间comsuming? (内存怎么样?)
我玩过一会儿,但没有什么明显的:
从timeit导入定时器为T
number = 10000
setup =the_list = [5,7,6,5,5,4,4,7 ,5,4]
stmt =[item,the_list.count(item))for item in sorted(set(the_list))]
T(stmt = stmt, setup = setup).timeit(number = number)
Out [230]:0.058799982070922852
stmt =L = []; \\\
for item in sorted the_list)):\\\
L.append((item,the_list.count(item)))
T(stmt = stmt,setup = setup).timeit(number = number)
Out [233]:0.065041065216064453
stmt =[(item,the_list.count(item))set in item(sorted(the_list))]
T(stmt = stmt,setup = setup).timeit(number = number)
Out [236]:0.098351955413818359
b $ b
感谢
Christophe
更改排序方式,节约大约20 %。
更改:
def dupli
return [(item,the_list.count(item))for item in sorted(set(the_list))]
b $ b
到这个:
def dupli(the_list):
count = the_list.count#this优化增加了Sven的注释
result = [(item,count(item))for item in set(the_list)]
result.sort()
返回结果
这更快的原因是 sorted
迭代器必须创建一个临时
这是另一种方法,比原来快35%
def dupli(the_list):
counts = [0,0,0,0,0,0,0 ,0,0,0,0,0,0]
对于in_list中的n:
counts [n] + = 1
return [ (0,1,2,3,4,5,6,7,8,9,10,11,12)如果count [i]]
注意:您可能想随机化 the_list
的值。我的最终版本的 dupli
测试甚至更快与其他随机数据集( import random; the_list = [random.randint(0,12)for i in xrange(10)]
)
This is almost the same question than here, except that I am asking about the most efficient solution for a sorted result.
I have a list (about 10 integers randomly between 0 and 12), for example:
the_list = [5, 7, 6, 5, 5, 4, 4, 7, 5, 4]
I want to create a function that returns a list of tuples (item, counts) ordered by the first element, for example
output = [(4, 3), (5, 4), (6, 1), (7, 2)]
So far I have used:
def dupli(the_list):
return [(item, the_list.count(item)) for item in sorted(set(the_list))]
But I call this function almost a millon time and I need to make it as fast as I (python) can. Therefore my question: How to make this function less time comsuming? (what about memory?)
I have played around a bit, but nothing obvious came up:
from timeit import Timer as T
number=10000
setup = "the_list=[5, 7, 6, 5, 5, 4, 4, 7, 5, 4]"
stmt = "[(item, the_list.count(item)) for item in sorted(set(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)
Out[230]: 0.058799982070922852
stmt = "L = []; \nfor item in sorted(set(the_list)): \n L.append((item, the_list.count(item)))"
T(stmt=stmt, setup=setup).timeit(number=number)
Out[233]: 0.065041065216064453
stmt = "[(item, the_list.count(item)) for item in set(sorted(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)
Out[236]: 0.098351955413818359
Thanks
Christophe
Change where you sort for a savings of about 20%.
Change this:
def dupli(the_list):
return [(item, the_list.count(item)) for item in sorted(set(the_list))]
To this:
def dupli(the_list):
count = the_list.count # this optimization added courtesy of Sven's comment
result = [(item, count(item)) for item in set(the_list)]
result.sort()
return result
The reason this is faster is that the sorted
iterator must create a temporary list, whereas sorting the result sorts in place.
edit:Here's another approach that is 35% faster than your original:
def dupli(the_list):
counts = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
for n in the_list:
counts[n] += 1
return [(i, counts[i]) for i in (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) if counts[i]]
Note: You may want to randomize the values for the_list
. My final version of dupli
tests even faster with other random data sets (import random; the_list=[random.randint(0,12) for i in xrange(10)]
)
这篇关于如何最佳地计数一个python列表中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!