问题描述
我想从 dd / mm / yy
格式的日期解析中间的2位数字,但也允许日期和月份的单位数字。
这就是我想出的:
(?< = ^ [ \] {1,2} \ /)[\ d] {1,2}
我想要一个1位或2位数 [\d] {1,2}
,带有1或2位数字,斜线 ^ [ \d] {1,2} \ /
之前。
这对许多组合都不起作用,我测试过 10/10/10
, 11/12/13
等......
但令我惊讶的是(?< = ^ \\\\\\)/ [\ d] {1,2}
工作了。
但 [\d] {1,2}
如果也应匹配\\\\
没错,或者我错了?
关于后备支持 h2>
主要的正则表达式风格对于lookbehind有不同的支持;有些限制,有些甚至根本不支持。
- Javascript:不支持
- Python:仅限固定长度
- Java:仅限有限长度
- .NET:无限制
参考文献
Python
在Python中,只支持固定长度的lookbehind,原始模式会引发错误,因为 \d {1,2}
显然没有固定的长度。您可以通过在两个不同的固定长度的后视镜上交替来修复这个,例如,类似这样的事情:
(?< = ^ \\\\ /)\\\ {1,2} | (?< = ^ \d\d\ /)\d {1,2}
或许您可以将两个lookbehinds作为非捕获组的替代品:
(?:(?< = ^ \d\ /)|(?< = ^ \\\\ /))\d {1,2}
(请注意,您可以使用 \d
而不使用括号)。
也就是说,使用捕获组可能要简单得多:
^ \d { 1,2} \ /(\d {1,2})
请注意返回什么如果您只有一个组,则组1捕获。捕获组比后观更受支持,并且通常会导致更易读的模式(例如在这种情况下)。
此片段说明了以上所有要点:
p = re.compile(r'(?:(?< = ^ \\\\)|(? < = ^ \d\d\ /))\\\ {1,2}')
print(p.findall(12/34/56))# [34]
print(p.findall(1/23/45))#[23]
p = re.compile(r'^ \d {1 ,2} \ /(\d {1,2})')
print(p.findall(12/34/56))#[34]
print(p.findall(1/23/45))#[23]
p = re.compile(r'(?< = ^ \d {1,2 } \ /)\d {1,2}')
#raise错误(look-behind需要固定宽度模式)
参考文献
- ,,,
Java
Java仅支持有限长度的lookbehind,因此您可以使用 \d { 1,2}
就像在原始模式中一样。以下代码段演示了这一点:
String text =
12/34/56 date\\\
+
1/23/45另一个日期\ n;
模式p = Pattern.compile((?m)(?< = ^ \\d {1,2} /)\\d {1,2}) ;
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group());
} //34,23
请注意(?m)
是嵌入的,以便 ^
匹配每一行的开头。另请注意,由于 \
是字符串文字的转义字符,因此必须将\\
写入在Java中获得一个反斜杠。
C-Sharp
C#支持lookbehind的完整正则表达式。以下代码段显示了如何在lookbehind上使用 +
重复:
var text = @
1/23/45
12/34/56
123/45/67
1234/56/78
;
正则表达式r =新正则表达式(@(?m)(?< = ^ \d + /)\d {1,2});
foreach(匹配m in r.Matches(text)){
Console.WriteLine(m);
} //23,34,45,56
请注意,与Java不同,在C#中,您可以使用,这样您就不必转义 \
。
为了完整性,以下是您在C#中使用捕获组选项的方法:
正则表达式r =新正则表达式(@(?m)^ \d + /(\d {1,2}));
foreach(匹配m在r.Matches(文本)){
Console.WriteLine(Matched [+ m +]; month =+ m.Groups [1]);
}
鉴于之前的文字
,打印:
匹配[1/23];月= 23
配对[12/34];月= 34
配对[123/45];月= 45
配对[1234/56];月= 56
相关问题
I want to parse the 2 digits in the middle from a date in dd/mm/yy
format but also allowing single digits for day and month.
This is what I came up with:
(?<=^[\d]{1,2}\/)[\d]{1,2}
I want a 1 or 2 digit number [\d]{1,2}
with a 1 or 2 digit number and slash ^[\d]{1,2}\/
before it.
This doesn't work on many combinations, I have tested 10/10/10
, 11/12/13
, etc...
But to my surprise (?<=^\d\d\/)[\d]{1,2}
worked.
But the [\d]{1,2}
should also match if \d\d
did, or am I wrong?
On lookbehind support
Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.
- Javascript: not supported
- Python: fixed length only
- Java: finite length only
- .NET: no restriction
References
Python
In Python, where only fixed length lookbehind is supported, your original pattern raises an error because \d{1,2}
obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:
(?<=^\d\/)\d{1,2}|(?<=^\d\d\/)\d{1,2}
Or perhaps you can put both lookbehinds as alternates of a non-capturing group:
(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}
(note that you can just use \d
without the brackets).
That said, it's probably much simpler to use a capturing group instead:
^\d{1,2}\/(\d{1,2})
Note that findall
returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).
This snippet illustrates all of the above points:
p = re.compile(r'(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'^\d{1,2}\/(\d{1,2})')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'(?<=^\d{1,2}\/)\d{1,2}')
# raise error("look-behind requires fixed-width pattern")
References
Java
Java supports only finite-length lookbehind, so you can use \d{1,2}
like in the original pattern. This is demonstrated by the following snippet:
String text =
"12/34/56 date\n" +
"1/23/45 another date\n";
Pattern p = Pattern.compile("(?m)(?<=^\\d{1,2}/)\\d{1,2}");
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
} // "34", "23"
Note that (?m)
is the embedded Pattern.MULTILINE
so that ^
matches the start of every line. Note also that since \
is an escape character for string literals, you must write "\\"
to get one backslash in Java.
C-Sharp
C# supports full regex on lookbehind. The following snippet shows how you can use +
repetition on a lookbehind:
var text = @"
1/23/45
12/34/56
123/45/67
1234/56/78
";
Regex r = new Regex(@"(?m)(?<=^\d+/)\d{1,2}");
foreach (Match m in r.Matches(text)) {
Console.WriteLine(m);
} // "23", "34", "45", "56"
Note that unlike Java, in C# you can use @-quoted string so that you don't have to escape \
.
For completeness, here's how you'd use the capturing group option in C#:
Regex r = new Regex(@"(?m)^\d+/(\d{1,2})");
foreach (Match m in r.Matches(text)) {
Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
}
Given the previous text
, this prints:
Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56
Related questions
这篇关于为什么在某些风格的外观工作中没有有限的重复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!