我有一个正则表达式,像这样:
(?<one-1>cat)|(?<two-2>dog)|(?<three-3>mouse)|(?<four-4>fish)
当我尝试在.Net应用程序中使用此模式时,它失败了,因为组名中包含一个“-”。
因此,作为一种解决方法,我尝试使用两个正则表达式,第一个:
(?<A>cat)|(?<Be>dog)|(?<C>mouse)|(?<D>fish)
可以与我要控制的组名匹配的原始情况。
然后,我打算在这样的正则表达式中使用正确匹配的组名:
(?<A>one-1)|(?<Be>two-2)|(?<C>three-3)|(?<D>four-4)
我将通过找到与该模式匹配的字符串并确定组名是否相等来做到这一点。
我知道这似乎有些令人费解。感谢所提供的任何帮助。
最佳答案
遵循以下内容?
string[,] patterns = {
{ "one-1", "cat" },
{ "two-2", "dog" },
{ "three-3", "mouse" },
{ "four-4", "fish" },
};
var regex = buildRegex(patterns);
string[] tests = { "foo", "dog", "bar", "fish" };
foreach (var t in tests) {
var m = regex.Match(t);
Console.WriteLine("{0}: {1}", t, reportMatch(regex, m));
}
输出量
foo: no match dog: two-2 = dog bar: no match fish: four-4 = fish
First we build up a Regex
instance by escaping the group names and combining them with the patterns. Any non-word character is replaced with the sequence _nnn_
where nnn is its UTF-32 value.
private static Regex buildRegex(string[,] inputs)
{
string regex = "";
for (int i = 0; i <= inputs.GetUpperBound(0); i++) {
var part = String.Format(
"(?<{0}>{1})",
Regex.Replace(inputs[i,0], @"([\W_])", new MatchEvaluator(escape)),
inputs[i,1]);
regex += (regex.Length != 0 ? "|" : "") + part;
}
return new Regex(regex);
}
private static string escape(Match m)
{
return "_" + Char.ConvertToUtf32(m.Groups[1].Value, 0) + "_";
}
对于匹配项,.NET库没有给我们提供获取组名称的简便方法,因此我们必须采取另一种方式:对于每个组名称,我们都要检查该组是否匹配,如果不匹配,请让其名称呼叫者知道名称和捕获的子字符串。
private static string reportMatch(Regex regex, Match m)
{
if (!m.Success)
return "no match";
foreach (var name in regex.GetGroupNames()) {
if (name != "0" && m.Groups[name].Value.Length > 0)
return String.Format(
"{0} = {1}",
Regex.Replace(name, @"_(\d+)_",
new MatchEvaluator(unescape)),
m.Groups[name].Value);
}
return null;
}
private static string unescape(Match m)
{
return Char.ConvertFromUtf32(int.Parse(m.Groups[1].Value));
}