我正在尝试使用R读取非常大的json文件,并且正在将RJSON库与此推荐的json_data <- fromJSON(paste(readLines("myfile.json"), collapse=""))
一起使用
问题是我收到此错误消息
Error in paste(readLines("myfile.json"), collapse = "") :
无法在C函数'R_AllocStringBuffer'中分配内存(2383 Mb)
谁能帮我解决这个问题
最佳答案
好吧,只分享我对read json file的经验。进展
我试图读取52.8MB,19.7MB,1.3GB,93.9MB,158.5MB的json文件花费了我30分钟的时间,并最终自动恢复了R session ,此后尝试应用并行计算并希望看到进度但失败了。
https://github.com/hadley/plyr/issues/265
然后,我尝试添加参数pagesize = 10000,它的工作效率比以往任何时候都更高。好吧,我们只需要读取一次,然后再通过saveRDS保存为RData/Rda/Rds格式。
> suppressPackageStartupMessages(library('BBmisc'))
> suppressAll(library('jsonlite'))
> suppressAll(library('plyr'))
> suppressAll(library('dplyr'))
> suppressAll(library('stringr'))
> suppressAll(library('doParallel'))
>
> registerDoParallel(cores=16)
>
> ## https://www.kaggle.com/c/yelp-recsys-2013/forums/t/4465/reading-json-files-with-r-how-to
> ## https://class.coursera.org/dsscapstone-005/forum/thread?thread_id=12
> fnames <- c('business','checkin','review','tip','user')
> jfile <- paste0(getwd(),'/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_',fnames,'.json')
> dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.parallel=TRUE)
> dat
list()
> jfile
[1] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json"
[2] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_checkin.json"
[3] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json"
[4] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_tip.json"
[5] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_user.json"
> dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.progress='=')
opening file input connection.
Imported 61184 records. Simplifying into dataframe...
closing file input connection.
opening file input connection.
Imported 45166 records. Simplifying into dataframe...
closing file input connection.
opening file input connection.
Found 470000 records...
关于json - 在R中读取巨大的json文件,问题,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/29688946/