问题描述
有没有办法在 Unix 中删除文件中的重复行?
Is there a way to delete duplicate lines in a file in Unix?
我可以使用 sort -u
和 uniq
命令来完成,但我想使用 sed
或 awk
>.
I can do it with sort -u
and uniq
commands, but I want to use sed
or awk
.
这可能吗?
推荐答案
awk '!seen[$0]++' file.txt
seen
是一个关联数组,AWK 会将文件的每一行传递给它.如果某行不在数组中,则 seen[$0]
将评估为 false.!
是逻辑 NOT 运算符,它将把 false 反转为 true.AWK 将打印表达式计算结果为真的行.
seen
is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0]
will evaluate to false. The !
is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
++
递增 seen
以便 seen[$0] == 1
在第一次找到一行之后 >seen[$0] == 2
,依此类推.AWK 将除 0
和 ""
(空字符串)以外的所有内容评估为真.如果在 seen
中放置了重复的行,则 !seen[$0]
将评估为 false,并且该行不会写入输出.
The ++
increments seen
so that seen[$0] == 1
after the first time a line is found and then seen[$0] == 2
, and so on.AWK evaluates everything but 0
and ""
(empty string) to true. If a duplicate line is placed in seen
then !seen[$0]
will evaluate to false and the line will not be written to the output.
这篇关于如何删除文件中的重复行而不在Unix中对其进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!