这个问题有点令人困惑,所以我只举一个例子。
假设我有以下情况:
$ grep -P "locus_tag\tM715_1000193188" Genome.tbl -B1 -A8
193188 193066 gene
locus_tag M715_1000193188
193188 193066 mRNA
product hypothetical protein
protein_id gnl|CorradiLab|M715_1000193188
transcript_id gnl|CorradiLab|M715_mrna1000193188
193188 193066 CDS
product hypothetical protein
protein_id gnl|CorradiLab|M715_1000193188
transcript_id gnl|CorradiLab|M715_mrna1000193188
我想在“ locus_tag M715_1000193188”之后的8行中添加“#”,以便修改后的文件如下所示:
193188 193066 gene
locus_tag M715_1000193188
#193188 193066 mRNA
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
#193188 193066 CDS
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
本质上,我有一个带有〜3000个不同基因座标记的文件,对于其中的300个,我需要注释掉mRNA和CDS功能,因此位于locus_tag行之后的8行。
用sed做到这一点的任何可能方法?文件中还有其他类型的信息需要保留。
谢谢,
阿德里安
最佳答案
如果可以使用awk
,则应该这样做:
awk 'f&&f-- {$0="#"$0} /locus_tag/ {f=8} 1' file
193188 193066 gene
locus_tag M715_1000193188
#193188 193066 mRNA
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188
#193188 193066 CDS
# product hypothetical protein
# protein_id gnl|CorradiLab|M715_1000193188
# transcript_id gnl|CorradiLab|M715_mrna1000193188