正则表达式OR(|)的最大长度是2555?
dat <- paste("DB", 1:10000, sep="")
pat <- dat[1:2555]
pat <- paste("^", pat, "$", sep = "")
pat <- paste(pat, collapse = "|")
system.time({
(g.ok <- grep(pattern = pat, x = dat))
})
当pat
> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices
[4] utils datasets methods
[7] base
>
最佳答案
gsubfn package中的strapplyc
使用tcl正则表达式,使其独立于R,并且可以处理2556。使用问题中的dat
:
pat <- dat[1:2556]
pat <- paste0("^", pat, "$")
pat <- paste(pat, collapse = "|")
library(gsubfn)
out <- Filter(nchar, strapplyc(dat, pat, simplify = c))
length(out)
## [1] 2556
关于r - R:或者正则表达式的最大长度是2555? OR(|)2556的长度时错误,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23134335/