我创建了以下正则表达式
^xy_yx_blaa_(\d+)([\s\S]*?)(^[A-D]$|QM)+[\s\S]*?(?:SW|Analyzing)
我的问题是,当我运行此示例时,它是regex101,它将获得199个匹配项(这就是我想要的),但是当我在C#程序中使用它时,它仅获得55个匹配项
经过进一步调查,我发现C#程序仅与仅包含“ QM”的文本匹配,但是在regex101中,它与包含A | B | C | D | QM的文本匹配
这是我当前的代码
TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
MatchCollection matches = Regex.Matches(text, pattern, RegexOptions.Multiline);
提前致谢
这是输入字符串的示例
xy_yx_blaa_184
is the act of composing and sending electronic messages, typically
consisting of alphabetic and numeric characters, between two or more
users of mobile phones, tablets, desktops/laptops, or other devices.
Text messages may be sent over a cellular network, or may also be sent
via an Internet connection.
Derived
QM
SW
xy_yx_blaa_199
is the act of composing and sending electronic messages, typically
consisting of alphabetic and numeric characters, between two or more
users of mobile phones, tablets, desktops/laptops, or other devices.
Text messages may be sent over a cellular network, or may also be sent
via an Internet connection.
Derived
A
SW
在上面的文本示例中,C#将捕获第一个(包含QM),但是在regex 101中,它将捕获两个
最佳答案
使用\r?
(或等效的$
)时,应在任何RegexOptions.Multiline
之前添加可选的(?m)
模式,因为文件可能具有Windows CRLF结尾,并且$
锚仅在\n
之前匹配,LF符号。
此外,[\s\S]
更像是一种技巧,您需要使用.
和RegexOptions.Singleline
来匹配任何字符。
var pattern = @"^xy_yx_blaa_(\d+)(.*?)(^[A-D]\r?$|QM)+.*?(?:SW|Analyzing)";
var results = Regex.Matches(text, pattern, RegexOptions.Multiline | RegexOptions.Singleline)
.Cast<Match>()
.Select(m => m.Value)
.ToList();
这是一个regex demo和C# demo。
关于c# - regex结果在C#和regex101之间有所不同,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50773144/