File_a
1 MIR6859-1 2340 DDX11L1 3222
2 MIR6859-1 4860 WASH7P 7074
3 WASH7P 326 MIR1302-2 670
4 FAM138A 15 MIR1302-2 5730
8 LOC729737 7270 OR4F5 64205
9 LOC729737 3070 OR4F5 68405
10 LOC729737 88330 LOC100132287 94996
11 LOC100132287 86996 LOC729737 96330
12 LOC100132287 80196 LOC729737 103130
13 LOC100132287 72396 LOC729737 110930
14 LOC100132287 61196 LOC729737 122130
15 LOC100132287 56596 LOC729737 126730
File_b
10 LOC7 883
15 TYUI 678
8 LOC123 764
40 QWER 456
8 LOC125 783
和预期的输出是
1 MIR6859-1 2340 DDX11L1 3222
2 MIR6859-1 4860 WASH7P 7074
3 WASH7P 326 MIR1302-2 670
4 FAM138A 15 MIR1302-2 5730
8 LOC729737 7270 OR4F5 64205 LOC123 764 LOC125 783
9 LOC729737 3070 OR4F5 68405
10 LOC729737 88330 LOC100132287 94996 LOC7 883
11 LOC100132287 86996 LOC729737 96330
12 LOC100132287 80196 LOC729737 103130
13 LOC100132287 72396 LOC729737 110930
14 LOC100132287 61196 LOC729737 122130
15 LOC100132287 56596 LOC729737 126730 TYUI 678
40 QWER 456
因此,基本上这是基于两个文件中第一列的相等性的自然联接。
我在网上搜索后尝试了各种命令-
join -a1 file_a file_b
和
paste file_a file_b
但没有得到想要的输出。
最佳答案
awk
解决方案:
awk 'NR == FNR{ a[$1] = ($1 in a? a[$1] OFS : "")$2 OFS $3; next }
$1 in a{ $0 = $0 OFS a[$1]; delete a[$1] }1;
END{ for (i in a) print i, a[i] }' file_b file_a
输出:
1 MIR6859-1 2340 DDX11L1 3222
2 MIR6859-1 4860 WASH7P 7074
3 WASH7P 326 MIR1302-2 670
4 FAM138A 15 MIR1302-2 5730
8 LOC729737 7270 OR4F5 64205 LOC123 764 LOC125 783
9 LOC729737 3070 OR4F5 68405
10 LOC729737 88330 LOC100132287 94996 LOC7 883
11 LOC100132287 86996 LOC729737 96330
12 LOC100132287 80196 LOC729737 103130
13 LOC100132287 72396 LOC729737 110930
14 LOC100132287 61196 LOC729737 122130
15 LOC100132287 56596 LOC729737 126730 TYUI 678
40 QWER 456