问题描述
我要匹配正则表达式特殊字符,\^$.?*|+()[{
.我试过了:
I want to match a regular expression special character, \^$.?*|+()[{
. I tried:
x <- "a[b"
grepl("[", x)
## Error: invalid regular expression '[', reason 'Missing ']''
(相当于stringr::str_detect(x, "[")
或stringi::stri_detect_regex(x, "[")
.)
将值加倍以对其进行转义不起作用:
Doubling the value to escape it doesn't work:
grepl("[[", x)
## Error: invalid regular expression '[[', reason 'Missing ']''
也不使用反斜杠:
grepl("\[", x)
## Error: '\[' is an unrecognized escape in character string starting ""\["
如何匹配特殊字符?
此问题的一些特殊情况已经过时,且写得足够好,以至于可以厚脸皮地作为以下内容的副本来关闭:
R正则表达式中的转义期
如何在R中转义问号?
在正则表达式中转义管道("|")
Some special cases of this in questions that are old and well written enough for it to be cheeky to close as duplicates of this:
Escaped Periods In R Regular Expressions
How to escape a question mark in R?
escaping pipe ("|") in a regex
推荐答案
使用双反斜杠转义
R将反斜杠视为字符常量的转义值. (...正则表达式也是如此.因此,在为模式提供字符参数时需要两个反斜杠.第一个实际上不是字符,而是使第二个变成字符.)您可以看到如何使用 cat
处理.
Escape with a double backslash
R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character.) You can see how they are processed using cat
.
y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20AC"
print(y)
## [1] "double quote: \", tab: \t, newline: \n, unicode point: €"
cat(y)
## double quote: ", tab: , newline:
## , unicode point: €
进一步阅读:
要在正则表达式中使用特殊字符,最简单的方法通常是使用反斜杠对它们进行转义,但是如上所述,反斜杠本身需要转义.
To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped.
grepl("\\[", "a[b")
## [1] TRUE
要匹配反斜杠,您需要加倍转义,以产生四个反斜杠.
To match backslashes, you need to double escape, resulting in four backslashes.
grepl("\\\\", c("a\\b", "a\nb"))
## [1] TRUE FALSE
rebus
程序包包含每个特殊字符的常量,以免您误输入斜杠.
The rebus
package contains constants for each of the special characters to save you mistyping slashes.
library(rebus)
OPEN_BRACKET
## [1] "\\["
BACKSLASH
## [1] "\\\\"
有关更多示例,请参见:
For more examples see:
?SpecialCharacters
您的问题可以通过以下方式解决:
Your problem can be solved this way:
library(rebus)
grepl(OPEN_BRACKET, "a[b")
形成一个角色类
grepl("[?]", "a?b")
## [1] TRUE
两个特殊字符在字符类中具有特殊含义:\
和^
.
Two of the special characters have special meaning inside character classes: \
and ^
.
即使在字符类中,反斜杠仍然需要转义.
Backslash still needs to be escaped even if it is inside a character class.
grepl("[\\\\]", c("a\\b", "a\nb"))
## [1] TRUE FALSE
只有在开方括号后方,才需要跳开笛子.
Caret only needs to be escaped if it is directly after the opening square bracket.
grepl("[ ^]", "a^b") # matches spaces as well.
## [1] TRUE
grepl("[\\^]", "a^b")
## [1] TRUE
rebus
还允许您形成字符类.
rebus
also lets you form a character class.
char_class("?")
## <regex> [?]
使用预先存在的字符类
如果要匹配所有标点符号,则可以使用[:punct:]
字符类.
grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$"))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
stringi
将此映射到Unicode通用类别以进行标点,因此其行为略有不同.
stringi
maps this to the Unicode General Category for punctuation, so its behaviour is slightly different.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]")
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
您还可以使用跨平台语法来访问UGC.
You can also use the cross-platform syntax for accessing a UGC.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{P}")
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
使用\ Q \ E转义
在\\Q
和\\E
之间放置字符使正则表达式引擎按原义而不是正则表达式对待它们.
Use \Q \E escapes
Placing characters between \\Q
and \\E
makes the regular expression engine treat them literally rather than as regular expressions.
grepl("\\Q.\\E", "a.b")
## [1] TRUE
rebus
允许您编写正则表达式的文字块.
rebus
lets you write literal blocks of regular expressions.
literal(".")
## <regex> \Q.\E
不要使用正则表达式
正则表达式并不总是答案.如果要匹配固定的字符串,则可以这样做,例如:
Don't use regular expressions
Regular expressions are not always the answer. If you want to match a fixed string then you can do, for example:
grepl("[", "a[b", fixed = TRUE)
stringr::str_detect("a[b", fixed("["))
stringi::stri_detect_fixed("a[b", "[")
这篇关于如何在正则表达式中处理特殊字符,例如\ ^ $.?* | +()[{?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!