问题描述
在Unix中是否可以删除文件中的重复行?
Is there a way to delete duplicate lines in a file in Unix?
我可以使用sort -u
和uniq
命令执行此操作,但是我想使用sed
或awk
.有可能吗?
I can do it with sort -u
and uniq
commands, but I want to use sed
or awk
.Is that possible?
推荐答案
awk '!seen[$0]++' file.txt
seen
是Awk会将文件的每一行传递到的关联数组.如果行不在数组中,则seen[$0]
的计算结果为false. !
是逻辑NOT运算符,会将false转换为true. Awk将打印表达式计算结果为true的行. ++
递增seen
,以便在第一次找到行之后找到seen[$0] == 1
,然后找到seen[$0] == 2
,依此类推.
Awk将除0
和""
(空字符串)以外的所有内容评估为true.如果在seen
中放置了重复的行,则!seen[$0]
的结果将为false,并且该行将不会写入输出中.
seen
is an associative-array that Awk will pass every line of the file to. If a line isn't in the array then seen[$0]
will evaluate to false. The !
is the logical NOT operator and will invert the false to true. Awk will print the lines where the expression evaluates to true. The ++
increments seen
so that seen[$0] == 1
after the first time a line is found and then seen[$0] == 2
, and so on.
Awk evaluates everything but 0
and ""
(empty string) to true. If a duplicate line is placed in seen
then !seen[$0]
will evaluate to false and the line will not be written to the output.
这篇关于如何在不对Unix进行排序的情况下删除文件中的重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!