  • A multiline string string (already read from a file file)
  • Two patterns pattern1 and pattern2 which will match a substring of exactly one line in string each. These lines will be called line1 and line2.


The patterns are regex-patterns, but I can change their format if that makes it easier.


I am looking for a way to get all the lines between line1 and line2 in python (we can safely assume that line1 is before line2).


Of course this could be done in a for loop with a flag set by pattern1 and a break when pattern2 matches. I am looking for a more compact solution here, though. This is a trivial oneliner in awk:

awk '/pattern1/,/pattern2/' file



aaa aa a
bbb bb b
ccc cc c
ddd dd d
eee ee e
fff ff f

pattern2:d dd

pattern2: d dd


bbb bb b
ccc cc c
ddd dd d


In awk the /start/, /end/ range regex prints the entire line that the /start/is found in up to and including the entire line where the /end/ pattern is found. It is a useful construct and has been copied by Perl, sed, Ruby and others.


To do a range operator in Python, write a class that keeps track of the state of the previous call to the start operator until the end operator. We can use a regex (as awk does) or this can be trivially modified to anything returning a True or False status for a line of data.


Given your example file, you can do:

import re

class FlipFlop:
    ''' Class to imitate the bahavior of /start/, /end/ flip flop in awk '''
    def __init__(self, start_pattern, end_pattern):
        self.patterns = start_pattern, end_pattern
        self.state = False
    def __call__(self, st):
        ms=[e.search(st) for e in self.patterns]
        if all(m for m in ms):
            self.state = False
            return True
        rtr=True if self.state else False
        if ms[self.state]:
            self.state = not self.state
        return self.state or rtr

with open('/tmp/file') as f:
    ff=FlipFlop(re.compile('b bb'), re.compile('d dd'))
    print ''.join(line if ff(line) else "" for line in f)


bbb bb b
ccc cc c
ddd dd d


That retains a line-by-line file read with the flexibility of /start/,/end/ regex seen in other languages. Of course, you can do the same approach for a multiline string (assumed be named s):

''.join(line+"\n" if ff(line) else "" for line in s.splitlines())


Idiomatically, in awk, you can get the same result as a flipflop using a flag:

$ awk '/b bb/{flag=1} flag{print $0} /d dd/{flag=0}' file


You can replicate that in Python as well (with more words):

with open('file') as f:
    for line in f:
        if re.search(r'b bb', line):
        if flag:
        if re.search(r'd dd', line):


Which can also be used with in memory string.


Or, you can use a multi-line regex:

with open('/tmp/file') as f:
    print ''.join(re.findall(r'^.*b bb[\s\S]*d dd.*$', f.read(), re.M))


But that requires reading the entire file into memory. Since you state the string has been read into memory, that is probably easiest in this case:

''.join(re.findall(r'^.*b bb[\s\S]*d dd.*$', s, re.M))


08-30 07:48