我试图编写一个扰流器识别系统,以便将字符串中的任何扰流器替换为指定的扰流器字符。

我想匹配一个用方括号括起来的字符串,这样方括号内的内容就是捕获组1,包括周围括号在内的整个字符串就是匹配项。

我当前正在使用\[(.*?]*)\],对此答案here中的表达式进行了少许修改,因为我还希望嵌套方括号成为捕获组1的一部分。

该表达式的问题在于,尽管它可以工作并匹配以下内容:


Jim ate a [sandwich][sandwich]sandwich作为组1进行匹配
Jim ate a [sandwich with [pickles and onions]][sandwich with [pickles and onions]]sandwich with [pickles and onions]作为组1进行匹配
[[[[][[[[][[[作为组1进行匹配
[]]]][]]]]]]]作为组1进行匹配


但是,如果我要匹配以下内容,它将无法正常工作:


Jim ate a [sandwich with [pickles] and [onions]]匹配两个:


[sandwich with [pickles],其中sandwich with [pickles作为组1
[onions]],其中onions]作为组1



我应该使用哪种表达式来匹配[sandwich with [pickles] and [onions]]sandwich with [pickles] and [onions]作为组1?

编辑:

由于似乎无法使用正则表达式在Java中实现此功能,是否有替代解决方案?

编辑2:

我还希望能够通过找到的每个匹配项来拆分字符串,因此由于String.split(regex)方便,因此正则表达式的替代方案将难以实现。这是一个例子:


Jim ate a [sandwich] with [pickles] and [dried [onions]]匹配所有:


[sandwich],其中sandwich作为组1
[pickles],其中pickles作为组1
[dried [onions]],其中dried [onions]作为组1



拆分句子应如下所示:

Jim ate a
with
and

最佳答案

更直接的解决方案

This solution将省略空白或仅空白子字符串

public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) {
    List<String> subTreeList = new ArrayList<String>();
    int level = 0;
    int lastCloseBracket= 0;
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
            if (c == markStart) {
                    level++;
                    if (level == 1 && i != 0 && i!=lastCloseBracket &&
                        !s.substring(lastCloseBracket, i).trim().isEmpty()) {
                            subTreeList.add(s.substring(lastCloseBracket, i).trim());
                }
            }
        } else if (c == markEnd) {
            if (level > 0) {
                level--;
                lastCloseBracket = i+1;
            }
            }
    }
    if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) {
        subTreeList.add(s.substring(lastCloseBracket).trim());
    }
    return subTreeList;
}


然后,将其用作

String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here";
List<String> between_balanced =  getStrsBetweenBalancedSubstrings(input, '[', ']');
System.out.println("Result: " + between_balanced);
// => Result: [Jim ate a, with, and, and ], and more here]


原始答案(更复杂,显示了提取嵌套括号的方法)

您还可以提取平衡括号内的所有子字符串,然后将其拆分:

String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]";
List<String> balanced = getBalancedSubstrings(input, '[', ']', true);
System.out.println("Balanced ones: " + balanced);
List<String> rx_split = new ArrayList<String>();
for (String item : balanced) {
    rx_split.add("\\s*" + Pattern.quote(item) + "\\s*");
}
String rx = String.join("|", rx_split);
System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));


此函数将找到所有[]平衡的子字符串:

public static List<String> getBalancedSubstrings(String s, Character markStart,
                                     Character markEnd, Boolean includeMarkers) {
    List<String> subTreeList = new ArrayList<String>();
    int level = 0;
    int lastOpenBracket = -1;
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c == markStart) {
            level++;
            if (level == 1) {
                lastOpenBracket = (includeMarkers ? i : i + 1);
            }
        }
        else if (c == markEnd) {
            if (level == 1) {
                subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i)));
            }
            if (level > 0) level--;
        }
    }
    return subTreeList;
}


IDEONE demo

代码执行的结果:

Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]']
In-betweens: ['Jim ate a', 'with', 'and', 'and ]']


积分:getBalancedSubstrings基于peter.murray.rustHow to split this “Tree-like” string in Java regex? post的回答。

09-04 04:41