本文介绍了使用 awk 连接两个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个如下所示的文件,它们以制表符分隔:
I have two files like shown below which are tab-delimited:
文件 A
chr1 123 aa b c d
chr1 234 a b c d
chr1 345 aa b c d
chr1 456 a b c d
....
文件 B
xxxx abcd chr1 123 aa c d e
yyyy defg chr1 345 aa e f g
...
我想将基于 3 列的两个文件与chr1"、123"和aa"连接起来,并将文件 B 中的前两列添加到文件 A,这样输出如下所示:输出:
I want to join the two files based on 3 columns with "chr1", "123" and "aa" and add first two columns from file B to file A, such that output looks as shown below:output:
chr1 123 aa b c d xxxx abcd
chr1 234 a b c d
chr1 345 aa b c d yyyy defg
chr1 456 a b c d
任何人都可以帮助在 awk 中做到这一点.如果可能,使用 awk oneliners?
Could anyone help to do this in awk. If possible using awk oneliners?
推荐答案
这是使用 awk
的一种方法:
Here is one approach using awk
:
$ awk 'NR==FNR{a[$3,$4]=$1OFS$2;next}{$6=a[$1,$2];print}' OFS=' ' fileb filea
chr1 123 a b c xxxx abcd
chr1 234 a b c
chr1 345 a b c yyyy defg
chr1 456 a b c
说明:
NR==FNR # current recond num match the file record num i.e in filea
a[$3,$4]=$1OFS$2 # Create entry in array with fields 3 and 4 as the key
next # Grab the next line (don't process the next block)
$6=a[$1,$2] # Assign the looked up value to field 6 (+rebuild records)
print # Print the current line & the matching entry from fileb ($6)
OFS=' ' # Seperate each field with a single TAB on output
对于 3 个字段的问题,您只需添加额外的字段:
For the 3 field problem you simple add the extra field:
$ awk 'NR==FNR{a[$3,$4,$5]=$1OFS$2;next}{$6=a[$1,$2,$3];print}' OFS=' ' fileb filea
chr1 123 aa b c xxxx abcd
chr1 234 a b c
chr1 345 aa b c yyyy defg
chr1 456 a b c
这篇关于使用 awk 连接两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!