我正在使用strsplit耗尽内存(大概);这是代码:

split.fields <- function (frame, fields, split, suffix, ...) {
  for (field in fields) {
    v <- sapply(strsplit(frame[[field]],"@",...),"[",1)
    frame[[paste0(field,suffix)]] <- frame[[field]]
    frame[[field]] <- v
  }
  frame
}
split.version <- function (frame, fields)
  split.fields(frame, fields, split="@", suffix="Ver", fixed=TRUE)
> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 238165 12.8     467875   25   407500 21.8
Vcells 369492  2.9     905753    7   905631  7.0
> frame <- data.frame(browser = sample(c("Chrome@28","Chrome@27","Firefox@21","Firefox@22","IE@9","IE@8"), 1e7, replace=TRUE), stringsAsFactors=FALSE)
> str(frame)
'data.frame':   10000000 obs. of  1 variable:
 $ browser: chr  "IE@8" "Chrome@27" "Chrome@27" "Chrome@27" ...
> object.size(frame)
80000992 bytes
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   240555 12.9     467875  25.0   407500  21.8
Vcells 10373979 79.2   34109873 260.3 40534688 309.3
> system.time(frame <- split.version(frame,"browser"))
   user  system elapsed
 73.700   0.872  74.831
> object.size(frame)
160001248 bytes
> str(frame)
'data.frame':   10000000 obs. of  2 variables:
 $ browser   : chr  "IE" "Chrome" "Chrome" "Chrome" ...
 $ browserVer: chr  "IE@8" "Chrome@27" "Chrome@27" "Chrome@27" ...
> gc()
           used  (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells   264888  14.2   16652260 889.4  31376740 1675.7
Vcells 20459856 156.1   95461025 728.4 119226749  909.7

除了R进程的RSS现在是 1.6G 之外,这一切看起来都差不多。

这似乎暗示max used中的1675.7Mb Ncell
列尚未返回到操作系统。

我不太在乎操作系统不会收回RAM,我在乎什么
是要处理分配给1.6G的80M数据R(在我的真实数据上
用完可用的物理RAM)

有没有办法提高内存效率?

例如,可能将字符 vector 转换为一个因子并对其进行运算
它的水平会有所帮助吗?
R version 3.0.1 (2013-05-16) -- "Good Sport"
Platform: x86_64-pc-linux-gnu (64-bit)

最佳答案

如何使用substrregexpr:

x <- c("Chrome@28","Chrome@27","Firefox@21","IE@8")
substr(x,1,regexpr("@",x)-1)
[1] "Chrome"  "Chrome"  "Firefox" "IE"

关于r - R:使用 `strsplit`耗尽内存,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17660202/

10-14 15:18
查看更多