i always use this commmand line to sort and get uniq lines only and it works as a charm even with large files (over 500,000 lines)
sort filename.txt | uniq | sponge filename.txt
shortest equivalent python code would be
f = open("filename.txt", "r")
lines = [line for line in f]
lines = lines.sort()
lines = set(lines)
but of course this is not scalable because of memory constrains and writing scalable code in python would take time, so i wonder what is the shortest equivalent code (package) in python
您无需在python中进行排序,因为 set
You don't need to do a sort in python since set
would take care of uniqueness even without sorting.
f = open("filename.txt", "r")
lines = set(f.readlines())
shell sort
命令还会将这些行加载到内存中,因此使用它不会节省任何内存.如果文件很大,或者您坚决不使用额外的内存,则可以尝试一些疯狂的技巧,如下所示: http://neopythonic.blogspot.in/2008/10/sorting-million-32-bit-integers-in-2mb.html
The shell sort
command would also load the lines into memory, so using that would not get you any memory savings. If you have really large files or you are adamant on not using additional memory, you can try some crazy tricks like the one shown here: http://neopythonic.blogspot.in/2008/10/sorting-million-32-bit-integers-in-2mb.html