我有以下数据集格式:
Identified_____ID#2357_____ReadSequence:1238
Unknown_____0_____ReadSequence:0979
Unknown_____0_____ReadSequence:5476
Identified_____ID#567899_____ReadSequence:4376
使用
awk
命令,如何提取ReadSequences
但仅提取已标识的行(基于第一列条目)? 最佳答案
$ awk -F"_____" '$1=="Identified" {print $3}' test.in
ReadSequence:1238
ReadSequence:4376
如果您只需要readsequence id,
gsub
是您的朋友:$ awk -F"_____" '$1=="Identified" {gsub(/^.*:/,"",$3); print $3}' test.in
1238
4376
关于linux - 从多列文件中提取行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38476272/