问题描述
我正在寻找类似于bash命令comm的东西,我可以用它来选择2个文件所特有的和它们共同的条目.例如,当我每个文件只有一列时,Comm工作得很好.
I am looking for something similar to the bash command comm, that I can use to select entries both unique to my 2 files and common to them. Comm worked great when I had just one column per file, eg.
comm -13 FILE1.txt FILE2.txt > Entries_only_in_file1.txt
但是现在我希望保留多列信息.我想选择第2列作为过滤两个文件之间唯一和常见条目的行.如果两个文件中都出现了第二列中的条目,我也想将信息记录在第3、4和5列中(如果可能的话,这并不重要).这是输入和输出的示例.
But now I have multiple columns of info I wish to keep. I want to select column 2 as the one to filter rows for unique and common entries between my two files. If the entry in column two appears in both files I also want to record the info in columns 3,4,and 5 (if possible, this is not as important).Here is an example of input and output.
FILE1.txt
NM_023928 AACS 2 2 1
NM_182662 AADAT 2 2 1
NM_153698 AAED1 1 5 3
NM_001271 AAGAB 2 2 1
FILE2.txt
NM_153698 AAED1 2 5 3
NM_001271 AAGAB 2 2 1
NM_001605 AARS 3 40 37
NM_212533 ABCA2 3 4 2
想要的输出:
COMMON.txt
NM_153698 AAED1 1 5 3 2 5 3
NM_001271 AAGAB 2 2 1 2 2 1
UNIQUE_TO_1.txt
NM_023928 AACS 2 2 1
NM_182662 AADAT 2 2 1
UNIQUE_TO_2.txt
NM_001605 AARS 3 40 37
NM_212533 ABCA2 3 4 2
我知道以前也有类似的问题,但是我找不到我想要的东西.任何想法都非常感谢,谢谢.
I know there has been similar questions before but I can't quite find what I'm looking for. Any ideas greatly appreciated, thank you.
推荐答案
join
具有以下对您的任务有用的选项:
join
has the following options which are useful for your task:
-
-j FIELD
:加入字段FIELD
-
-o FORMAT
:将输出格式指定为FILENUM.FIELD的逗号分隔列表. -
-v FILENUM
:仅在FILENUM
上输出行.
-j FIELD
: join on fieldFIELD
-o FORMAT
: specify output format, as a comma separated list of FILENUM.FIELD.-v FILENUM
: output lines only onFILENUM
.
两个文件的共同点:
$ join -j2 -o 1.1,1.2,1.3,1.4,1.5,2.3,2.4,2.5 FILE1.txt FILE2.txt
NM_153698 AAED1 1 5 3 2 5 3
NM_001271 AAGAB 2 2 1 2 2 1
对FILE1唯一:
$ join -j2 -v1 FILE1.txt FILE2.txt
AACS NM_023928 2 2 1
AADAT NM_182662 2 2 1
对FILE2唯一:
$ join -j2 -v2 FILE1.txt FILE2.txt
AARS NM_001605 3 40 37
ABCA2 NM_212533 3 4 2
这篇关于BASH comm命令,但用于多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!