问题描述
我知道__builtin__
sorted()函数可用于任何可迭代的函数.但是有人可以解释anylist.sort()与sorted(anylist)之间的巨大(10倍)性能差异吗?另外,请指出如果我在测量此方法方面有任何错误.
I know that __builtin__
sorted() function works on any iterable. But can someone explain this huge (10x) performance difference between anylist.sort() vs sorted(anylist) ? Also, please point out if I am doing anything wrong with way this is measured.
"""
Example Output:
$ python list_sort_timeit.py
Using sort method: 20.0662879944
Using sorted builin method: 259.009809017
"""
import random
import timeit
print 'Using sort method:',
x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000)").repeat())
print x
print 'Using sorted builin method:',
x = min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000)").repeat())
print x
如标题所示,我对比较list.sort()与sorted(list)感兴趣.上面的代码片段显示了一些有趣的事情,python的sort函数对于已经排序的数据表现得非常好.正如Anurag指出的那样,在第一种情况下,sort方法对已排序的数据起作用,而在第二种情况下,它对新数据起作用以一次又一次地工作.
As the title says, I was interested in comparing list.sort() vs sorted(list). The above snippet showed something interesting that, python's sort function behaves very well for already sorted data. As pointed out by Anurag, in the first case, the sort method is working on already sorted data and while in second sorted it is working on fresh piece to do work again and again.
所以我写了这个来测试,是的,它们非常接近.
So I wrote this one to test and yes, they are very close.
"""
Example Output:
$ python list_sort_timeit.py
Using sort method: 19.0166599751
Using sorted builin method: 23.203567028
"""
import random
import timeit
print 'Using sort method:',
x = min(timeit.Timer("test_list1.sort()","import random;test_list1=random.sample(xrange(1000),1000);test_list1.sort()").repeat())
print x
print 'Using sorted builin method:',
x = min(timeit.Timer("sorted(test_list2)","import random;test_list2=random.sample(xrange(1000),1000);test_list2.sort()").repeat())
print x
哦,当我键入此内容时,我看到Alex Martelli做出了回应.(我将保留编辑内容,因为它可能会很有用).
Oh, I see Alex Martelli with a response, as I was typing this one.. ( I shall leave the edit, as it might be useful).
推荐答案
您的度量错误如下:在您第一次调用test_list1.sort()
之后,该列表对象 IS 进行了排序-以及Python的 timsort 排序,速度很快在已排序的列表上!这是使用timeit
时最常见的错误-不经意间就产生了副作用并且没有考虑到它们.
Your error in measurement is as follows: after your first call of test_list1.sort()
, that list object IS sorted -- and Python's sort, aka timsort, is wickedly fast on already sorted lists!!! That's the most frequent error in using timeit
-- inadvertently getting side effects and not accounting for them.
这是一组很好的测量方法,最好使用命令行中的timeit
:
Here's a good set of measurements, using timeit
from the command line as it's best used:
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
y=list(x); y.sort()'
1000 loops, best of 3: 452 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
x.sort()'
10000 loops, best of 3: 37.4 usec per loop
$ python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' '
sorted(x)'
1000 loops, best of 3: 462 usec per loop
如您所见,y.sort()
和sorted(x)
并驾齐驱,但是x.sort()
由于副作用而获得的好处超过了一个数量级的优势-只是由于您的测量误差:但这并不能告诉您大约sort
vs sorted
本身! -)
As you see, y.sort()
and sorted(x)
are neck and neck, but x.sort()
thanks to the side effects gains over an order of magnitude's advantage -- just because of your measurement error, though: this tells you nothing about sort
vs sorted
per se! -)
这篇关于列表中的Python sort()方法与内置sorted()函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!