问题描述
我在将模式与 R
中的文本字符串匹配时遇到了一些问题.
I'm having some problems matching a pattern with a string of text in R
.
当文本类似于 "lettersornumbersorspaces y lettersornumbersorspaces" 时,我试图用
grepl
获得 TRUE
.
I'm trying to get TRUE
with grepl
when the text is something like "lettersornumbersorspaces y lettersornumbersorspaces".
我正在使用以下 regex
:
([:alnum:]|[:blank:])+[:blank:][yY][:blank:]([:alnum:]|[:blank:])+
当使用 regex
如下获取地址"时,它按预期工作.
When using the regex
as follows to obtain the "address" it works at expected.
regex <- "([:alnum:]|[:blank:])+[:blank:][yY][:blank:]([:alnum:]|[:blank:])+"
address <- str_extract(fulltext, regex)
我看到那个地址是我需要的文本.现在,如果我想使用 grepl
来获得 TRUE
如下:
I see that address is the text that I need. Now, if I want to use grepl
to get a TRUE
as follows:
grepl("([:alnum:]|[:blank:])+[:blank:][yY][:blank:]([:alnum:]|[:blank:])+", address,ignore.case = TRUE)
FALSE
被返回.这怎么可能?我使用相同的 regex
来获得 TRUE
.我曾尝试修改 grepl
参数,但没有一个与此相关.
FALSE
is returned. How is this possible? I'm using the same regex
to get TRUE
. I have tried modifications to the grepl
parameters, but non of them is related to this.
文本示例为:"26 de Marzo y Pareyra de la Luz"
谢谢!!
推荐答案
虽然 stringr ICU 正则表达式引擎支持模式中的裸 POSIX 字符类,在基本 R 正则表达式风格(两个 PCRE (perl=TRUE
) 和 TRE),POSIX 字符类必须在括号表达式内.[:alnum:]
-> [[:alnum:]]
.
Although stringr ICU regex engines supports bare POSIX character classes in the pattern, in base R regex flavors (both PCRE (perl=TRUE
) and TRE), POSIX character classes must be inside bracket expressions. [:alnum:]
-> [[:alnum:]]
.
x <- c("AZaz09 y AZaz09", "ĄŻaz09 y AZŁł09", "26 de Marzo y Pareyra de la Luz")
grepl("[[:alnum:][:blank:]]+[[:blank:]][yY][[:blank:]][[:alnum:][:blank:]]+", x)
## => [1] TRUE TRUE TRUE
grepl("[[:alnum:][:blank:]]+[[:blank:]][yY][[:blank:]][[:alnum:][:blank:]]+", x, perl=TRUE)
## => [1] TRUE TRUE TRUE
查看在线演示
单独使用[:alnum:]
时,是一个简单的括号表达式,匹配单个字符,一个:
,a
、l
、n
、u
、m
.
When you use [:alnum:]
alone, it is a simple bracket expression that matches a single character, a :
, a
, l
, n
, u
, m
.
模式详情:
[[:alnum:][:blank:]]+
- 1+ 个字母数字或水平空白符号[[:blank:]]
- 1 个水平空白符号[yY]
-y
或Y
[[:blank:]]
- 1 个水平空白符号[[:alnum:][:blank:]]+
- 1+ 个字母数字或水平空白符号
[[:alnum:][:blank:]]+
- 1+ alphanumeric or horizontal whitespace symbols[[:blank:]]
- 1 horizontal whitespace symbols[yY]
- eithery
orY
[[:blank:]]
- 1 horizontal whitespace symbols[[:alnum:][:blank:]]+
- 1+ alphanumeric or horizontal whitespace symbols
这篇关于POSIX 字符类在基本 R 正则表达式中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!