问题描述
我正在尝试获取输入字符串中与给定模式匹配的 ALL 子字符串.
I'm trying to get ALL the substrings in the input string that match the given pattern.
例如,
给出的字符串:aaxxbbaxb
模式:a [a-z] {0,3} b
(我实际上要表达的是:所有以a开头并以b结尾,但在它们之间最多可以有2个字母的模式)
Given string: aaxxbbaxb
Pattern: a[a-z]{0,3}b
(What I actually want to express is: all the patterns that starts with a and ends with b, but can have up to 2 alphabets in between them)
我想要的准确结果(及其索引):
Exact results that I want (with their indexes):
aaxxb:索引0〜4
axxb:索引1〜4
axxbb:索引1〜5
axb:索引6〜8
aaxxb: index 0~4
axxb: index 1~4
axxbb: index 1~5
axb: index 6~8
但是当我使用 Pattern.compile()
和 Matcher.find()
在Pattern和Matcher类中运行它时,它只会给我:
But when I run it through the Pattern and Matcher classes using Pattern.compile()
and Matcher.find()
, it only gives me:
aaxxb:索引0〜4
axb:索引6〜8
aaxxb : index 0~4
axb : index 6~8
这是我使用的代码.
Pattern pattern = Pattern.compile("a[a-z]{0,3}b", Pattern.CASE_INSENSITIVE);
Matcher match = pattern.matcher("aaxxbbaxb");
while (match.find()) {
System.out.println(match.group());
}
如何检索 与模式匹配的每个字符串 ?
How can I retrieve every single piece of string that matches the pattern?
当然,只要高效,就不必使用Pattern和Matcher类:
Of course, it doesn't have to use Pattern and Matcher classes, as long as it's efficient :)
推荐答案
(请参阅:与Java正则表达式匹配的所有重叠子字符串)
这是我想出的完整解决方案.它可以处理原始正则表达式中的零宽度模式,边界等.它遍历文本字符串的所有子字符串,并通过在模式的开头和结尾使用适当数量的通配符填充模式来检查正则表达式是否仅在特定位置匹配.这似乎对我尝试过的情况有效-尽管我还没有进行广泛的测试.无疑,它的效率比以前要低.
Here is the full solution that I came up with. It can handle zero-width patterns, boundaries, etc. in the original regular expression. It looks through all substrings of the text string and checks whether the regular expression matches only at the specific position by padding the pattern with the appropriate number of wildcards at the beginning and end. It seems to work for the cases I tried -- although I haven't done extensive testing. It is most certainly less efficient than it could be.
public static void allMatches(String text, String regex)
{
for (int i = 0; i < text.length(); ++i) {
for (int j = i + 1; j <= text.length(); ++j) {
String positionSpecificPattern = "((?<=^.{"+i+"})("+regex+")(?=.{"+(text.length() - j)+"}$))";
Matcher m = Pattern.compile(positionSpecificPattern).matcher(text);
if (m.find())
{
System.out.println("Match found: \"" + (m.group()) + "\" at position [" + i + ", " + j + ")");
}
}
}
}
这篇关于Java递归(?)重复(?)深度(?)模式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!