我有以下数据集格式:

Identified_____ID#2357_____ReadSequence:1238
Unknown_____0_____ReadSequence:0979
Unknown_____0_____ReadSequence:5476
Identified_____ID#567899_____ReadSequence:4376

使用awk命令,如何提取ReadSequences但仅提取已标识的行(基于第一列条目)?

最佳答案

$ awk -F"_____" '$1=="Identified" {print $3}' test.in
ReadSequence:1238
ReadSequence:4376

如果您只需要readsequence id,gsub是您的朋友:
$ awk -F"_____" '$1=="Identified" {gsub(/^.*:/,"",$3); print $3}' test.in
1238
4376

关于linux - 从多列文件中提取行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38476272/

10-15 16:06