我正在尝试构建一个像这样的正则表达式:

[match-word] ... [exclude-specific-word] ... [match-word]

这似乎可以提前否定,但是当我遇到这样的情况时,我遇到了一个问题:
[match-word] ... [exclude-specific-word] ... [match-word] ... [excluded word appears again]

我希望上面的句子匹配,但是第一个和第二个匹配单词之间的否定超前“溢出”,因此第二个单词永远不会匹配。

让我们看一个实际的例子。

我不会匹配在两个单词之间包含单词“i”和单词“pie”的每个句子,但不能匹配每个单词。
我有以下三句话:
i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this

我有这个正则表达式:
^i(?!.*hate).*pie          - have removed the word boundaries for clarity, original is: ^i\b(?!.*\bhate\b).*\bpie\b

它与第一个句子匹配,但与第二个句子不匹配,因为否定的超前扫描会扫描整个字符串。

有没有一种方法可以限制负面的超前行为,以便在遇到“讨厌”之前遇到“派”就满意了?

注意:在我的实现中,此正则表达式(它是从语法搜索引擎动态构建的)后面可能还有其他术语,例如:
^i(?!.*hate).*pie.*donuts

我当前正在使用JRegex,但如有必要,可能会切换到JDK Regex

更新:我最初的问题中忘了提及:

句子中可能还会存在“否定结构”,即使“否定”结构中存在更多条件,我也想匹配该句子。

为澄清起见,请看以下句子:
i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
i sure like eating pie, but i like donuts and i hate making pie <- Do want to match this

rob的答案非常适合这种额外的限制,因此我接受那个。

最佳答案

在起始词和终止词之间的每个字符上,都必须确保它与否定词或终止词不匹配。像这样(为了便于阅读,我在其中添加了一些空白):

^i ( (?!hate|pie) . )* pie

这是一个测试事物的python程序。
import re

test = [ ('i sure like eating pie, but i love donuts', True),
         ('i sure like eating pie, but i hate donuts', True),
         ('i sure hate eating pie, but i like donuts', False) ]

rx = re.compile(r"^i ((?!hate|pie).)* pie", re.X)

for t,v in test:
    m = rx.match(t)
    print t, "pass" if bool(m) == v else "fail"

关于正则表达式:两次比赛之间的否定超前,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/9843338/

10-13 07:42