File_a

1 MIR6859-1 2340    DDX11L1 3222
2 MIR6859-1 4860    WASH7P  7074
3 WASH7P    326 MIR1302-2   670
4 FAM138A   15  MIR1302-2   5730
8 LOC729737 7270    OR4F5   64205
9 LOC729737 3070    OR4F5   68405
10 LOC729737    88330   LOC100132287    94996
11 LOC100132287 86996   LOC729737   96330
12 LOC100132287 80196   LOC729737   103130
13 LOC100132287 72396   LOC729737   110930
14 LOC100132287 61196   LOC729737   122130
15 LOC100132287 56596   LOC729737   126730

File_b
10 LOC7 883
15 TYUI 678
8 LOC123 764
40 QWER 456
8 LOC125 783

和预期的输出是
1 MIR6859-1 2340    DDX11L1 3222
2 MIR6859-1 4860    WASH7P  7074
3 WASH7P    326 MIR1302-2   670
4 FAM138A   15  MIR1302-2   5730
8 LOC729737 7270    OR4F5   64205  LOC123 764  LOC125 783
9 LOC729737 3070    OR4F5   68405
10 LOC729737    88330   LOC100132287    94996 LOC7  883
11 LOC100132287 86996   LOC729737   96330
12 LOC100132287 80196   LOC729737   103130
13 LOC100132287 72396   LOC729737   110930
14 LOC100132287 61196   LOC729737   122130
15 LOC100132287 56596   LOC729737   126730 TYUI 678
40 QWER 456

因此,基本上这是基于两个文件中第一列的相等性的自然联接。

我在网上搜索后尝试了各种命令-
join -a1 file_a file_b


paste file_a file_b

但没有得到想要的输出。

最佳答案

awk 解决方案:

awk 'NR == FNR{ a[$1] = ($1 in a? a[$1] OFS : "")$2 OFS $3; next }
     $1 in a{ $0 = $0 OFS a[$1]; delete a[$1] }1;
     END{ for (i in a) print i, a[i] }' file_b file_a

输出:
1 MIR6859-1 2340    DDX11L1 3222
2 MIR6859-1 4860    WASH7P  7074
3 WASH7P    326 MIR1302-2   670
4 FAM138A   15  MIR1302-2   5730
8 LOC729737 7270    OR4F5   64205 LOC123 764 LOC125 783
9 LOC729737 3070    OR4F5   68405
10 LOC729737    88330   LOC100132287    94996 LOC7 883
11 LOC100132287 86996   LOC729737   96330
12 LOC100132287 80196   LOC729737   103130
13 LOC100132287 72396   LOC729737   110930
14 LOC100132287 61196   LOC729737   122130
15 LOC100132287 56596   LOC729737   126730 TYUI 678
40 QWER 456

09-30 17:49