本文介绍了提取两个定义文本之间的文本信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个很大的文本文件,其中包含很多文本信息,但是我想在两个定义的文本之间提取文本.例如
I have big text file which has lot of text information but I would like to extract the text between two defined text.e.g
/begin MEASUREMENT XYZ
UBYTE
_CNV_A_R_LINEAR_____71_CM
1
100.
-40.
160.
FORMAT "%3.0"
SYMBOL_LINK "XYZ" 0
/begin IF_DATA EVTRKMNBXERTBK
DEFAULT_RASTERS 3 3
/end IF_DATA
/end MEASUREMENT
即在这之间我要提取文本/开始测量和/结束测量.
i.e /begin MEASUREMENT and /end MEASUREMENT in between this I want to extract text.
我的代码是:
import re
path = r"d:\xyz.txt"
file = open(path, 'r')
lines = file.read()
pattern = re.compile(r'begin MEASUREMENT[\s][\w+](.*?)end MEASUREMENT')
print re.findall(pattern, lines)
推荐答案
使用(?s)
,这将多行视为单行.因此,点匹配所有字符,包括换行符.
Use (?s)
, this is consider multiple line as a single line. So dot match all characters including newlines.
pattern = re.compile(r'(?s)begin MEASUREMENT[\s](.*?)end MEASUREMENT')
所以尝试一下,
import re
path = "py.txt"
file = open(path, 'r')
lines = file.read()
pattern = re.compile(r'(?s)begin MEASUREMENT[\s](.*?)end MEASUREMENT')
result = re.findall(pattern, lines)
print result[0]
已编辑
t = "XYZ"
pattern = re.compile(r'(?s)begin MEASUREMENT\s+((%s).*?)end MEASUREMENT'%t)
这篇关于提取两个定义文本之间的文本信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!