在python中排序并获取文件的uniq行

本文介绍了在python中排序并获取文件的uniq行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我总是使用此命令行仅排序和获取uniq行，即使在大文件(超过500,000行)的情况下，它也可作为一种魅力

i always use this commmand line to sort and get uniq lines only and it works as a charm even with large files (over 500,000 lines)

sort filename.txt | uniq | sponge filename.txt

最短的等效python代码将是

shortest equivalent python code would be

f = open("filename.txt", "r")
lines = [line for line in f]
lines = lines.sort()
lines = set(lines)

但是，由于内存限制，这当然是不可扩展的，并且在python中编写可扩展代码将花费一些时间，所以我想知道python中最短的等效代码(包)是什么

but of course this is not scalable because of memory constrains and writing scalable code in python would take time, so i wonder what is the shortest equivalent code (package) in python

推荐答案

您无需在python中进行排序，因为 set 即使不进行排序也会照顾唯一性.

You don't need to do a sort in python since set would take care of uniqueness even without sorting.

f = open("filename.txt", "r")
lines = set(f.readlines())

shell sort 命令还会将这些行加载到内存中，因此使用它不会节省任何内存.如果文件很大，或者您坚决不使用额外的内存，则可以尝试一些疯狂的技巧，如下所示: http://neopythonic.blogspot.in/2008/10/sorting-million-32-bit-integers-in-2mb.html

The shell sort command would also load the lines into memory, so using that would not get you any memory savings. If you have really large files or you are adamant on not using additional memory, you can try some crazy tricks like the one shown here: http://neopythonic.blogspot.in/2008/10/sorting-million-32-bit-integers-in-2mb.html

这篇关于在python中排序并获取文件的uniq行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！