本文介绍了grep排除模式并排除前2行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,想使用grep排除模式.但我也想删除每个匹配的前2行(不包括在内).我该怎么做?

I have a file and would like to use grep to exclude a pattern. But I would also like to remove the 2 preceding lines for every match (that is excluded). How do I do this?

我尝试过的事情:

cat file.txt
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___     from: 1   to: 296
  Start     End  Strand Pattern                 Mismatch Sequence
    217     225       + pattern:AA[CT]NNN[AT]CN        . aacacctcc
Sequence: M99648_IGHV2-26*01_Homosapiens_F_V-REGION_164..464_301nt_1_____301+0=301___     from: 1   to: 301
  Start     End  Strand Pattern                 Mismatch Sequence
    176     184       + pattern:AA[CT]NNN[AT]CN        . aatcctaca

# With grep -v I can remove the line with pattern

grep -v "[acgt]\{3\}cc[acgt][acgt]\{3\}" file.txt
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: M99648_IGHV2-26*01_Homosapiens_F_V-REGION_164..464_301nt_1_____301+0=301___ from: 1 to: 301
Start End Strand Pattern Mismatch Sequence

# But using -B 2 does not work here

grep -B 2 -v "[acgt]\{3\}cc[acgt][acgt]\{3\}" file.txt
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: M99648_IGHV2-26*01_Homosapiens_F_V-REGION_164..464_301nt_1_____301+0=301___ from: 1 to: 301
Start End Strand Pattern Mismatch Sequence

有什么主意如何也为每场比赛都删除前2行吗?

Any ideas how to remove the 2 preceding lines as well for every match?

推荐答案

GNU sed上进行了测试,语法/功能可能会因其他实现而有所不同

Tested on GNU sed, syntax/feature might vary with other implementations

sed -E 'N;N; /[acgt]{3}cc[acgt][acgt]{3}/d' ip.txt

  • -E使用ERE,某些sed版本需要-r而不是-E
  • N;N将另外两行附加到模式空间
  • /[acgt]{3}cc[acgt][acgt]{3}/d如果此条件匹配,则删除
    • 请注意,这将尝试在三行中的任意位置匹配正则表达式...而且,[acgt][acgt]{3}可以简化为[acgt]{4}
    • /\n.*\n.*[acgt]{3}cc[acgt][acgt]{3}/d将限制为仅匹配第三行
      • -E use ERE, some sed versions require -r instead of -E
      • N;N append two more lines to pattern space
      • /[acgt]{3}cc[acgt][acgt]{3}/d delete if this condition matches
        • note that this would try to match the regex anywhere in the three lines... also, [acgt][acgt]{3} could be simplified to [acgt]{4}
        • /\n.*\n.*[acgt]{3}cc[acgt][acgt]{3}/d will restrict to matching only 3rd line
        • 这篇关于grep排除模式并排除前2行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 09:23