我需要搜索这样的东西:

lines = """package p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
end p_dio_bfm; -- end package;

package body p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end p_dio_bfm;"""

我需要提取包名,即PydioO-BFM和包声明,即“包PydioO-BFM”和第一个“结束PydioiBFM”之间的部分。
问题是包声明可能以“结束PydioiBFM”或“结束包”结尾,因此我尝试了以下的“或”正则表达式:
-适用于以“结束包”结尾的包
-不适用于以“end pck_name;”结尾的包
pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines)

问题是regex的(package \1)部分,在这里我要捕获单词“package”或匹配的包名称。
更新:我提供了一个完整的代码,我希望能澄清它:
import re
lines1 = """package p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
end p_dio_bfm;

package body p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end p_dio_bfm;"""

lines2 = """package p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
end package;

package body p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end package;"""

lines1 = lines1.replace('\n', ' ')
print lines1

pattern = re.compile("package\s+(\w+)\s+is(.*)end\s+(package|\1)\s*;")
match = pattern.search(lines1)

print match

lines2 = lines2.replace('\n', ' ')
print lines2

match = pattern.search(lines2)

print match

在这两种情况下,我都希望使用一个独特的regex返回这一部分:
"""procedure setBFMCmd (
          variable  pin : in tBFMCmd
          );"""

没有我删除的字符。

最佳答案

怎么样:

>>> for row in re.findall(
...   r'package(?:\s.*?)(?P<needle>[^\s]+)\s+is\s+(.*?)end\s+(?:package|(?P=needle));',
...   lines,
...   re.S
... ):
...   print '{{{', row[1], '}}}'
...
{{{ procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
}}}
{{{ procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
}}}

“我冒昧地不去精确地过滤”米哈伊Hunu如何通过包括第二个街区来询问。

09-10 14:09
查看更多