问题描述
我有一个列的文件。如何删除文件中的重复行?在Unix / Linux上,使用 strong I have a file with one column. How to delete repeated lines in a file? On Unix/Linux, use the If you need a Python script: Update: The 这篇关于我怎样才能从文件中删除重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! uniq
命令,根据David Locke的回答,或者按照William Pursell的评论
sort
如果你需要一个Python脚本:
$ $ $ $ $ $ $ $ $ $ $ $ $ $ line_seen = set()#包含已经看到的行
outfile = open(outfilename ,w)
打开(infilename,r)的行:
如果行不在lines_seen:#不是重复的
outfile.write(line)
lines_seen .add(line)
outfile.close()
sort
/ uniq
组合会删除重复项,但会返回一个排序行的文件,这可能会也可能不会成为你想要的。上面的Python脚本不会重新排序行,但只是删除重复。当然,为了获得上面的脚本来排序,只需要省略 outfile.write(line)
,而是在循环之后立即执行 outfile.writelines(sorted(lines_seen))
。uniq
command, as per David Locke's answer, or sort
, as per William Pursell's comment.lines_seen = set() # holds lines already seen
outfile = open(outfilename, "w")
for line in open(infilename, "r"):
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
outfile.close()
sort
/uniq
combination will remove duplicates but return a file with the lines sorted, which may or may not be what you want. The Python script above won't reorder lines, but just drop duplicates. Of course, to get the script above to sort as well, just leave out the outfile.write(line)
and instead, immediately after the loop, do outfile.writelines(sorted(lines_seen))
.