我正在尝试构建一个“简单的”正则表达式(在Java中),以匹配如下语句:

I want to cook something
I want to cook something with chicken and cheese
I want to cook something with chicken but without onions
I want to cook something without onions but with chicken and cheese
I want to cook something with candy but without nuts within 30 minutes

在最佳情况下,它还应匹配:I want to cook something with candy and without nuts within 30 minutes
在这些示例中,我要捕获“包含”成分,“排除”成分以及烹饪过程的最大“持续时间”。如您所见,模式中这3个捕获组中的每个捕获组都是可选的,每个捕获组都以一个特定的单词开头(带,(但)?? 。另外,这些成分可以包含几个单词,因此在第二个/第三个示例中,“鸡肉和奶酪”应与命名为“包括”的捕获组匹配。

在最好的情况下,我想写一个与此相似的模式:
I want to cook something ((with (?<include>.+))|((but )?without (?<exclude>.+))|(within (?<duration>.+) minutes))*

显然,这是行不通的,因为这些通配符也可以与关键字匹配,因此在第一个关键字匹配之后,其他所有内容(包括其他关键字)都将被相应命名捕获组的贪婪通配符匹配。

我尝试使用超前方式,例如:
something ((with (?<IncludedIngredients>.*(?=but)))|(but )?without (?<ExcludedIngredients>.+))+

该正则表达式可识别something with chicken but without onions,但与something with chicken不匹配。

在正则表达式中是否有简单的解决方案?

附言“简单”的解决方案意味着我不必在句子中指定这些关键字的所有可能组合,而不必按每种组合中使用的关键字数量对其进行排序。

最佳答案

它可能可以归结为以下构造。
(?m)^I[ ]want[ ]to[ ]cook[ ]something(?=[ ]|$)(?<Order>(?:(?<with>\b(?:but[ ])?with[ ](?:(?!(?:\b(?:but[ ])?with(?:in|out)?\b)).)*)|(?<without>\b(?:but[ ])?without[ ](?:(?!(?:\b(?:but[ ])?with(?:in|out)?\b)).)*)|(?<time>\bwithin[ ](?<duration>.+)[ ]minutes[ ]?)|(?<unknown>(?:(?!(?:\b(?:but[ ])?with(?:in|out)?\b)).)+))*)$
https://regex101.com/r/RHfGnb/1

展开式

 (?m)
 ^ I [ ] want [ ] to [ ] cook [ ] something
 (?= [ ] | $ )
 (?<Order>                      # (1 start)
      (?:
           (?<with>                      # (2 start)
                \b
                (?: but [ ] )?
                with [ ]
                (?:
                     (?!
                          (?:
                               \b
                               (?: but [ ] )?
                               with
                               (?: in | out )?
                               \b
                          )
                     )
                     .
                )*
           )                             # (2 end)
        |  (?<without>                   # (3 start)
                \b
                (?: but [ ] )?
                without [ ]
                (?:
                     (?!
                          (?:
                               \b
                               (?: but [ ] )?
                               with
                               (?: in | out )?
                               \b
                          )
                     )
                     .
                )*
           )                             # (3 end)
        |  (?<time>                      # (4 start)
                \b within [ ]
                (?<duration> .+ )             # (5)
                [ ] minutes [ ]?
           )                             # (4 end)
        |  (?<unknown>                   # (6 start)
                (?:
                     (?!
                          (?:
                               \b
                               (?: but [ ] )?
                               with
                               (?: in | out )?
                               \b
                          )
                     )
                     .
                )+
           )                             # (6 end)
      )*
 )                             # (1 end)
 $

关于java - 正则表达式中的通配符,仅在停用词之前是贪婪的,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58846410/

10-10 04:24