我想在File2的File1的第1列中查找模式,然后在File2旁边打印File1的第二列:

File1(两列制表符分隔):

APBW    lung
APCA    non virulent
ABKM    lung
APBX    lung
KK020   -
APBZ    non virulent
AOSU    lung
APBY    non virulent
APBV    joint; lung; CNS
CP001321    virulent
APBT    virulent
APBU    non-virulent
APCB    moderadamente virulenta (nose)
CP005384    -


File2(制表符分隔的两列):

HS372_00243 gi|219690483|gb|CP001321.1|
HS372_00436 gi|529264994|gb|APBX01000055.1|
HS372_00445 gi|529256455|gb|APBT01000061.1|
HS372_00544 gi|529259149|gb|APBV01000035.1|
HS372_00545 gi|529259149|gb|APBV01000035.1|
HS372_00546 gi|529259149|gb|APBV01000035.1|


所需的输出(三列制表符分隔):

HS372_00243 gi|219690483|gb|CP001321.1| virulent
HS372_00436 gi|529264994|gb|APBX01000055.1| lung
HS372_00445 gi|529256455|gb|APBT01000061.1| virulent
HS372_00544 gi|529259149|gb|APBV01000035.1| jointlungCNS
HS372_00545 gi|529259149|gb|APBV01000035.1| jointlungCNS
HS372_00546 gi|529259149|gb|APBV01000035.1| jointlungCNS


临时bash代码(不起作用),但对其他语言开放:

while read vl; do grep "$vl" File2 ; done < File1


还尝试了awk(因为它似乎正在寻找完全匹配并且File2中的字符串被其他东西包围,所以无法正常工作):

awk 'BEGIN { FS = OFS = "\t" } FNR==NR{a[$1]=$0;next}($1 in a){print a[$1],$2,$3}' File1 File2


谢谢伯纳多

最佳答案

听起来像是您要的东西:

awk '
BEGIN { FS=OFS="\t" }
NR==FNR { map[$1] = $2; next }
{
    for (key in map)
        if ($0 ~ key)
            $0 = $0 OFS map[key]
    print
}
' file1 file2

关于python - 查找图案并打印下一列(两个文件),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21702673/

10-12 13:54