我想在File2的File1的第1列中查找模式,然后在File2旁边打印File1的第二列:
File1(两列制表符分隔):
APBW lung
APCA non virulent
ABKM lung
APBX lung
KK020 -
APBZ non virulent
AOSU lung
APBY non virulent
APBV joint; lung; CNS
CP001321 virulent
APBT virulent
APBU non-virulent
APCB moderadamente virulenta (nose)
CP005384 -
File2(制表符分隔的两列):
HS372_00243 gi|219690483|gb|CP001321.1|
HS372_00436 gi|529264994|gb|APBX01000055.1|
HS372_00445 gi|529256455|gb|APBT01000061.1|
HS372_00544 gi|529259149|gb|APBV01000035.1|
HS372_00545 gi|529259149|gb|APBV01000035.1|
HS372_00546 gi|529259149|gb|APBV01000035.1|
所需的输出(三列制表符分隔):
HS372_00243 gi|219690483|gb|CP001321.1| virulent
HS372_00436 gi|529264994|gb|APBX01000055.1| lung
HS372_00445 gi|529256455|gb|APBT01000061.1| virulent
HS372_00544 gi|529259149|gb|APBV01000035.1| jointlungCNS
HS372_00545 gi|529259149|gb|APBV01000035.1| jointlungCNS
HS372_00546 gi|529259149|gb|APBV01000035.1| jointlungCNS
临时bash代码(不起作用),但对其他语言开放:
while read vl; do grep "$vl" File2 ; done < File1
还尝试了awk(因为它似乎正在寻找完全匹配并且File2中的字符串被其他东西包围,所以无法正常工作):
awk 'BEGIN { FS = OFS = "\t" } FNR==NR{a[$1]=$0;next}($1 in a){print a[$1],$2,$3}' File1 File2
谢谢伯纳多
最佳答案
听起来像是您要的东西:
awk '
BEGIN { FS=OFS="\t" }
NR==FNR { map[$1] = $2; next }
{
for (key in map)
if ($0 ~ key)
$0 = $0 OFS map[key]
print
}
' file1 file2
关于python - 查找图案并打印下一列(两个文件),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21702673/