问题描述
在bash(4.3.46(1))中,我有一些多行所谓的fasta记录,其中每个记录都是通过> name和以下行DNA序列([AGCTNacgtn])在线启动的,这里是三个记录:
In bash (4.3.46(1)) I have some multi-line so called fasta records where each record is initiated by on line with >name and the following lines DNA sequence ([AGCTNacgtn]), here three records:
>chr1
AGCTACTTTT
AGGGNGGTNN
>chr2
TTGNACACCC
TGGGGGAGTA
>chr3
TGACGTGGGT
TCGGGTTTTT
如何使用bash grep获取第二条记录?在其他语言中,可能会使用:
How do I use bash grep to get the second record ? In other languages one might use:
>chr2\n([AGCTNagctn]*\n)*
在Bash中,我试图使用此处的想法(以及其他SO).这不起作用:
In Bash I was trying to use the ideas from here (among other SOs). This did not work:
grep -zo '>chr2[AGCTNacgtn]+' file
结果应为:
>chr2
TTGNACACCC
TGGGGGAGTA
解决方案
在我的系统上,这就是解决方案(下面几乎是Cyrus',即没有管道连接到第二个grep .
):
On my system this was the solution (Almost Cyrus' below, i.e. with out the pipe to a second grep .
):
grep -Pzo '>chr1\n[AGCTNacgtn\n]+' file
推荐答案
使用GNU grep:
With GNU grep:
grep -Pzo '>chr2\n[AGCTNacgtn\n]+' file | grep .
输出:
>chr2
TTGNACACCC
TGGGGGAGTA
这篇关于grep(bash)多行模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!