问题描述
我想匹配一个正则表达式特殊字符,^$.?*|+()[{
.我试过了:
I want to match a regular expression special character, ^$.?*|+()[{
. I tried:
x <- "a[b"
grepl("[", x)
## Error: invalid regular expression '[', reason 'Missing ']''
(相当于 stringr::str_detect(x, "[")
或 stringi::stri_detect_regex(x, "[")
.)
将值加倍以逃避它不起作用:
Doubling the value to escape it doesn't work:
grepl("[[", x)
## Error: invalid regular expression '[[', reason 'Missing ']''
也不使用反斜杠:
grepl("[", x)
## Error: '[' is an unrecognized escape in character string starting ""["
如何匹配特殊字符?
问题中的一些特殊情况,这些问题已经过时且写得足够好,以至于可以厚颜无耻地将其作为重复项关闭:
R 正则表达式中的转义句点
如何在 R 中转义问号?
在正则表达式中转义管道(|")
Some special cases of this in questions that are old and well written enough for it to be cheeky to close as duplicates of this:
Escaped Periods In R Regular Expressions
How to escape a question mark in R?
escaping pipe ("|") in a regex
推荐答案
用双反斜杠转义
R 将反斜杠视为 字符常量 的转义值.(...正则表达式也是如此.因此在为模式提供字符参数时需要两个反斜杠.第一个实际上不是一个字符,而是它使第二个成为一个字符.)你可以看到如何使用 cat
处理它们.
Escape with a double backslash
R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character.) You can see how they are processed using cat
.
y <- "double quote: ", tab: , newline:
, unicode point: u20AC"
print(y)
## [1] "double quote: ", tab: , newline:
, unicode point: €"
cat(y)
## double quote: ", tab: , newline:
## , unicode point: €
进一步阅读:在 R 中用反斜杠转义反斜杠会在字符串中产生 2 个反斜杠,而不是 1
要在正则表达式中使用特殊字符,最简单的方法通常是用反斜杠将它们转义,但如上所述,反斜杠本身需要转义.
To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped.
grepl("\[", "a[b")
## [1] TRUE
要匹配反斜杠,您需要双重转义,从而产生四个反斜杠.
To match backslashes, you need to double escape, resulting in four backslashes.
grepl("\\", c("a\b", "a
b"))
## [1] TRUE FALSE
rebus
包包含每个特殊字符的常量,以防止您输入错误的斜线.
The rebus
package contains constants for each of the special characters to save you mistyping slashes.
library(rebus)
OPEN_BRACKET
## [1] "\["
BACKSLASH
## [1] "\\"
更多例子见:
?SpecialCharacters
你的问题可以这样解决:
Your problem can be solved this way:
library(rebus)
grepl(OPEN_BRACKET, "a[b")
形成字符类
grepl("[?]", "a?b")
## [1] TRUE
两个特殊字符在字符类中具有特殊含义: 和
^
.
Two of the special characters have special meaning inside character classes: and
^
.
反斜杠即使在字符类中也需要转义.
Backslash still needs to be escaped even if it is inside a character class.
grepl("[\\]", c("a\b", "a
b"))
## [1] TRUE FALSE
Caret 只需要在方括号之后直接转义即可.
Caret only needs to be escaped if it is directly after the opening square bracket.
grepl("[ ^]", "a^b") # matches spaces as well.
## [1] TRUE
grepl("[\^]", "a^b")
## [1] TRUE
rebus
还可以让你形成一个字符类.
rebus
also lets you form a character class.
char_class("?")
## <regex> [?]
使用预先存在的字符类
如果要匹配所有标点符号,可以使用[:punct:]
字符类.
grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$"))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
stringi
将其映射到用于标点符号的 Unicode 通用类别,因此其行为略有不同.
stringi
maps this to the Unicode General Category for punctuation, so its behaviour is slightly different.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]")
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
您还可以使用跨平台语法来访问 UGC.
You can also use the cross-platform syntax for accessing a UGC.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\p{P}")
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
使用 Q E 转义
在 \Q
和 \E
之间放置字符会使正则表达式引擎按字面意思处理它们而不是正则表达式.
Use Q E escapes
Placing characters between \Q
and \E
makes the regular expression engine treat them literally rather than as regular expressions.
grepl("\Q.\E", "a.b")
## [1] TRUE
rebus
允许您编写正则表达式的文字块.
rebus
lets you write literal blocks of regular expressions.
literal(".")
## <regex> Q.E
不要使用正则表达式
正则表达式并不总是答案.如果你想匹配一个固定的字符串,那么你可以这样做,例如:
Don't use regular expressions
Regular expressions are not always the answer. If you want to match a fixed string then you can do, for example:
grepl("[", "a[b", fixed = TRUE)
stringr::str_detect("a[b", fixed("["))
stringi::stri_detect_fixed("a[b", "[")
这篇关于我如何处理像 ^$.?*|+()[{ 在我的正则表达式中的特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!