awk - 在与STRING匹配的前8行前添加“＃”

这个问题有点令人困惑，所以我只举一个例子。

假设我有以下情况：

$ grep -P "locus_tag\tM715_1000193188" Genome.tbl -B1 -A8
193188  193066  gene
            locus_tag   M715_1000193188
193188  193066  mRNA
            product hypothetical protein
            protein_id  gnl|CorradiLab|M715_1000193188
            transcript_id   gnl|CorradiLab|M715_mrna1000193188
193188  193066  CDS
        product hypothetical protein
        protein_id  gnl|CorradiLab|M715_1000193188
        transcript_id   gnl|CorradiLab|M715_mrna1000193188

我想在“ locus_tag M715_1000193188”之后的8行中添加“＃”，以便修改后的文件如下所示：

193188  193066  gene
            locus_tag   M715_1000193188
#193188 193066  mRNA
#           product hypothetical protein
#           protein_id  gnl|CorradiLab|M715_1000193188
#           transcript_id   gnl|CorradiLab|M715_mrna1000193188
#193188 193066  CDS
#       product hypothetical protein
#       protein_id  gnl|CorradiLab|M715_1000193188
#       transcript_id   gnl|CorradiLab|M715_mrna1000193188

本质上，我有一个带有〜3000个不同基因座标记的文件，对于其中的300个，我需要注释掉mRNA和CDS功能，因此位于locus_tag行之后的8行。

用sed做到这一点的任何可能方法？文件中还有其他类型的信息需要保留。

谢谢，
阿德里安

最佳答案

如果可以使用awk，则应该这样做：

awk 'f&&f-- {$0="#"$0} /locus_tag/ {f=8} 1' file
193188  193066  gene
            locus_tag   M715_1000193188
#193188  193066  mRNA
#            product hypothetical protein
#            protein_id  gnl|CorradiLab|M715_1000193188
#            transcript_id   gnl|CorradiLab|M715_mrna1000193188
#193188  193066  CDS
#        product hypothetical protein
#        protein_id  gnl|CorradiLab|M715_1000193188
#        transcript_id   gnl|CorradiLab|M715_mrna1000193188