我正在尝试构建一个“简单的”正则表达式(在Java中),以匹配如下语句:
I want to cook something
I want to cook something with chicken and cheese
I want to cook something with chicken but without onions
I want to cook something without onions but with chicken and cheese
I want to cook something with candy but without nuts within 30 minutes
在最佳情况下,它还应匹配:
I want to cook something with candy and without nuts within 30 minutes
在这些示例中,我要捕获“包含”成分,“排除”成分以及烹饪过程的最大“持续时间”。如您所见,模式中这3个捕获组中的每个捕获组都是可选的,每个捕获组都以一个特定的单词开头(带,(但)?? 。另外,这些成分可以包含几个单词,因此在第二个/第三个示例中,“鸡肉和奶酪”应与命名为“包括”的捕获组匹配。
在最好的情况下,我想写一个与此相似的模式:
I want to cook something ((with (?<include>.+))|((but )?without (?<exclude>.+))|(within (?<duration>.+) minutes))*
显然,这是行不通的,因为这些通配符也可以与关键字匹配,因此在第一个关键字匹配之后,其他所有内容(包括其他关键字)都将被相应命名捕获组的贪婪通配符匹配。
我尝试使用超前方式,例如:
something ((with (?<IncludedIngredients>.*(?=but)))|(but )?without (?<ExcludedIngredients>.+))+
该正则表达式可识别
something with chicken but without onions
,但与something with chicken
不匹配。在正则表达式中是否有简单的解决方案?
附言“简单”的解决方案意味着我不必在句子中指定这些关键字的所有可能组合,而不必按每种组合中使用的关键字数量对其进行排序。
最佳答案
它可能可以归结为以下构造。(?m)^I[ ]want[ ]to[ ]cook[ ]something(?=[ ]|$)(?<Order>(?:(?<with>\b(?:but[ ])?with[ ](?:(?!(?:\b(?:but[ ])?with(?:in|out)?\b)).)*)|(?<without>\b(?:but[ ])?without[ ](?:(?!(?:\b(?:but[ ])?with(?:in|out)?\b)).)*)|(?<time>\bwithin[ ](?<duration>.+)[ ]minutes[ ]?)|(?<unknown>(?:(?!(?:\b(?:but[ ])?with(?:in|out)?\b)).)+))*)$
https://regex101.com/r/RHfGnb/1
展开式
(?m)
^ I [ ] want [ ] to [ ] cook [ ] something
(?= [ ] | $ )
(?<Order> # (1 start)
(?:
(?<with> # (2 start)
\b
(?: but [ ] )?
with [ ]
(?:
(?!
(?:
\b
(?: but [ ] )?
with
(?: in | out )?
\b
)
)
.
)*
) # (2 end)
| (?<without> # (3 start)
\b
(?: but [ ] )?
without [ ]
(?:
(?!
(?:
\b
(?: but [ ] )?
with
(?: in | out )?
\b
)
)
.
)*
) # (3 end)
| (?<time> # (4 start)
\b within [ ]
(?<duration> .+ ) # (5)
[ ] minutes [ ]?
) # (4 end)
| (?<unknown> # (6 start)
(?:
(?!
(?:
\b
(?: but [ ] )?
with
(?: in | out )?
\b
)
)
.
)+
) # (6 end)
)*
) # (1 end)
$
关于java - 正则表达式中的通配符,仅在停用词之前是贪婪的,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58846410/