我需要搜索这样的东西:
lines = """package p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end p_dio_bfm; -- end package;
package body p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
end p_dio_bfm;"""
我需要提取包名,即PydioO-BFM和包声明,即“包PydioO-BFM”和第一个“结束PydioiBFM”之间的部分。
问题是包声明可能以“结束PydioiBFM”或“结束包”结尾,因此我尝试了以下的“或”正则表达式:
-适用于以“结束包”结尾的包
-不适用于以“end pck_name;”结尾的包
pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines)
问题是regex的(package \1)部分,在这里我要捕获单词“package”或匹配的包名称。
更新:我提供了一个完整的代码,我希望能澄清它:
import re
lines1 = """package p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end p_dio_bfm;
package body p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
end p_dio_bfm;"""
lines2 = """package p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
);
end package;
package body p_dio_bfm is
procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
end package;"""
lines1 = lines1.replace('\n', ' ')
print lines1
pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines1)
print match
lines2 = lines2.replace('\n', ' ')
print lines2
match = pattern.search(lines2)
print match
在这两种情况下,我都希望使用一个独特的regex返回这一部分:
"""procedure setBFMCmd (
variable pin : in tBFMCmd
);"""
没有我删除的字符。
最佳答案
怎么样:
>>> for row in re.findall(
... r'package(?:\s.*?)(?P<needle>[^\s]+)\s+is\s+(.*?)end\s+(?:package|(?P=needle));',
... lines,
... re.S
... ):
... print '{{{', row[1], '}}}'
...
{{{ procedure setBFMCmd (
variable pin : in tBFMCmd
);
}}}
{{{ procedure setBFMCmd (
variable pin : in tBFMCmd
) is
begin
bfm_cmd := pin;
end setBFMCmd;
}}}
“我冒昧地不去精确地过滤”米哈伊Hunu如何通过包括第二个街区来询问。