本文介绍了使用 awk 连接两个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个如下所示的文件,它们以制表符分隔:

I have two files like shown below which are tab-delimited:

文件 A

chr1   123 aa b c d
chr1   234 a  b c d
chr1   345 aa b c d
chr1   456 a  b c d
....

文件 B

xxxx  abcd    chr1   123    aa    c    d    e
yyyy  defg    chr1   345    aa    e    f    g
...

我想将基于 3 列的两个文件与chr1"、123"和aa"连接起来,并将文件 B 中的前两列添加到文件 A,这样输出如下所示:输出:

I want to join the two files based on 3 columns with "chr1", "123" and "aa" and add first two columns from file B to file A, such that output looks as shown below:output:

chr1   123    aa    b    c    d    xxxx    abcd
chr1   234    a     b    c    d
chr1   345    aa    b    c    d    yyyy    defg
chr1   456    a    b    c    d

任何人都可以帮助在 awk 中做到这一点.如果可能,使用 awk oneliners?

Could anyone help to do this in awk. If possible using awk oneliners?

推荐答案

这是使用 awk 的一种方法:

Here is one approach using awk:

$ awk 'NR==FNR{a[$3,$4]=$1OFS$2;next}{$6=a[$1,$2];print}' OFS='	' fileb filea
chr1    123     a    b    c     xxxx    abcd
chr1    234     a    b    c
chr1    345     a    b    c     yyyy    defg
chr1    456     a    b    c

说明:

NR==FNR             # current recond num match the file record num i.e in filea
a[$3,$4]=$1OFS$2    # Create entry in array with fields 3 and 4 as the key
next                # Grab the next line (don't process the next block)
$6=a[$1,$2]         # Assign the looked up value to field 6 (+rebuild records)
print               # Print the current line & the matching entry from fileb ($6)

OFS='	'            # Seperate each field with a single TAB on output

对于 3 个字段的问题,您只需添加额外的字段:

For the 3 field problem you simple add the extra field:

$ awk 'NR==FNR{a[$3,$4,$5]=$1OFS$2;next}{$6=a[$1,$2,$3];print}' OFS='	' fileb filea
chr1    123    aa     b      c     xxxx     abcd
chr1    234    a      b      c
chr1    345    aa     b      c     yyyy     defg
chr1    456    a      b      c

这篇关于使用 awk 连接两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-21 01:24