问题描述
我厌倦了总是试图猜测,在使用许多正则表达式实现时,我是否应该转义诸如()[]{}|
"等特殊字符.
I am tired of always trying to guess, if I should escape special characters like '()[]{}|
' etc. when using many implementations of regexps.
它与 Python、sed、grep、awk、Perl、rename、Apache、find 等不同.是否有任何规则集告诉我什么时候应该,什么时候不应该转义特殊字符?它是否取决于正则表达式类型,如 PCRE、POSIX 或扩展正则表达式?
It is different with, for example, Python, sed, grep, awk, Perl, rename, Apache, find and so on.Is there any rule set which tells when I should, and when I should not, escape special characters? Does it depend on the regexp type, like PCRE, POSIX or extended regexps?
推荐答案
哪些字符必须转义,哪些不能转义确实取决于您使用的正则表达式风格.
Which characters you must and which you mustn't escape indeed depends on the regex flavor you're working with.
对于 PCRE 和大多数其他所谓的 Perl 兼容风格,请转义这些外部字符类:
For PCRE, and most other so-called Perl-compatible flavors, escape these outside character classes:
.^$*+?()[{\|
以及字符类中的这些:
^-]\
对于 POSIX 扩展正则表达式 (ERE),转义这些外部字符类(与 PCRE 相同):
For POSIX extended regexes (ERE), escape these outside character classes (same as PCRE):
.^$*+?()[{\|
转义任何其他字符是 POSIX ERE 的错误.
Escaping any other characters is an error with POSIX ERE.
在字符类中,反斜杠是 POSIX 正则表达式中的文字字符.你不能用它来逃避任何事情.如果要将字符类元字符包含为文字,则必须使用巧妙放置".将 ^ 放在除开头以外的任何位置,将 ] 放在开头,将 - 放在字符类的开头或结尾以逐字匹配这些内容,例如:
Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally, e.g.:
[]^-]
在 POSIX 基本正则表达式 (BRE) 中,这些是您需要转义以抑制其含义的元字符:
In POSIX basic regular expressions (BRE), these are metacharacters that you need to escape to suppress their meaning:
.^$*[\
BRE 中的转义括号和大括号赋予它们在 ERE 中未转义版本具有的特殊含义.某些实现(例如 GNU)在转义时还赋予其他字符特殊含义,例如 \?和+.转义 .^$*(){} 以外的字符通常是 BRE 的错误.
Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs. Some implementations (e.g. GNU) also give special meaning to other characters when escaped, such as \? and +. Escaping a character other than .^$*(){} is normally an error with BREs.
在字符类中,BRE 遵循与 ERE 相同的规则.
Inside character classes, BREs follow the same rule as EREs.
如果这一切让您头晕目眩,请获取一份RegexBuddy.在创建选项卡上,单击插入令牌,然后单击文字.RegexBuddy 将根据需要添加转义.
If all this makes your head spin, grab a copy of RegexBuddy. On the Create tab, click Insert Token, and then Literal. RegexBuddy will add escapes as needed.
这篇关于哪些特殊字符必须在正则表达式中转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!