我在 .txt 文件中有很长的单词和 regular expression patterns 列表,我是这样读的:

with open(fileName, "r") as f1:
    pattern_list = f1.read().split('\n')

为了说明,前七个看起来像这样:
print pattern_list[:7]
# ['abandon*', 'abuse*', 'abusi*', 'aching', 'advers*', 'afraid', 'aggress*']

我想知道每当我将输入字符串中的单词与 pattern_list 中的任何单词/模式匹配时。以下工作方式,但我看到两个问题:
  • 首先,每次检查新的string_input时,对pattern_list中的每个项目进行re.compile()似乎效率不高...但是当我尝试将re.compile(raw_str)对象存储在列表中时(这样才能将已经编译的正则表达式列表重用于更像 if w in regex_compile_list: 的东西,它不能正常工作。)
  • 其次,它有时不像我期望的那样工作 - 注意如何
  • 滥用* 与滥用
  • 匹配
  • abusi* 与被虐待和滥用
  • 匹配
  • ache* 与疼痛
  • 匹配

    我做错了什么,我怎样才能更有效率?预先感谢您对菜鸟的耐心,并感谢您的任何见解!
    string_input = "People who have been abandoned or abused will often be afraid of adversarial, abusive, or aggressive behavior. They are aching to abandon the abuse and aggression."
    for raw_str in pattern_list:
        pat = re.compile(raw_str)
        for w in string_input.split():
            if pat.match(w):
                print "matched:", raw_str, "with:", w
    #matched: abandon* with: abandoned
    #matched: abandon* with: abandon
    #matched: abuse* with: abused
    #matched: abuse* with: abusive,
    #matched: abuse* with: abuse
    #matched: abusi* with: abused
    #matched: abusi* with: abusive,
    #matched: abusi* with: abuse
    #matched: ache* with: aching
    #matched: aching with: aching
    #matched: advers* with: adversarial,
    #matched: afraid with: afraid
    #matched: aggress* with: aggressive
    #matched: aggress* with: aggression.
    

    最佳答案

    对于匹配的 shell 样式通配符,您可以(ab)使用模块 fnmatch

    由于 fnmatch 主要用于文件名比较,因此测试将区分大小写或不区分大小写,具体取决于您的操作系统。所以你必须规范化文本和模式(在这里,我为此使用了 lower())

    >>> import fnmatch
    
    >>> pattern_list = ['abandon*', 'abuse*', 'abusi*', 'aching', 'advers*', 'afraid', 'aggress*']
    >>> string_input = "People who have been abandoned or abused will often be afraid of adversarial, abusive, or aggressive behavior. They are aching to abandon the abuse and aggression."
    
    
    >>> for pattern in pattern_list:
    ...     l = fnmatch.filter(string_input.split(), pattern)
    ...     if l:
    ...             print pattern, "match", l
    

    生产:
    abandon* match ['abandoned', 'abandon']
    abuse* match ['abused', 'abuse']
    abusi* match ['abusive,']
    aching match ['aching']
    advers* match ['adversarial,']
    afraid match ['afraid']
    aggress* match ['aggressive', 'aggression.']
    

    关于Python:检查单词列表中的任何单词是否与正则表达式模式列表中的任何模式匹配,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17068486/

    10-16 16:34
    查看更多