正则表达式OR(|)的最大长度是2555?

dat <- paste("DB", 1:10000, sep="")

pat <- dat[1:2555]
pat <- paste("^", pat, "$", sep = "")
pat <- paste(pat, collapse = "|")

system.time({
  (g.ok <- grep(pattern = pat, x = dat))
})

当pat
> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices
[4] utils     datasets  methods
[7] base
>

最佳答案

gsubfn package中的strapplyc使用tcl正则表达式,使其独立于R,并且可以处理2556。使用问题中的dat:

pat <- dat[1:2556]
pat <- paste0("^", pat, "$")
pat <- paste(pat, collapse = "|")

library(gsubfn)
out <- Filter(nchar, strapplyc(dat, pat, simplify = c))
length(out)
## [1] 2556

关于r - R:或者正则表达式的最大长度是2555? OR(|)2556的长度时错误,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/23134335/

10-10 11:40