问题描述
我正在尝试在非常大的数据集上运行某些内容.基本上,我想遍历文件夹中的所有文件并在其上运行功能 fromJSON .但是,我希望它跳过产生错误的文件.我已经使用 tryCatch 构建了一个函数,但是,仅当我使用函数 lappy 而不是 parLapply 时,该函数才起作用.
I am trying to run something on a very large dataset. Basically, I want to loop through all files in a folder and run the function fromJSON on it. However, I want it to skip over files that produce an error. I have built a function using tryCatch however, that only works when i use the function lappy and not parLapply.
这是我的异常处理功能代码:
Here is my code for my exception handling function:
readJson <- function (file) {
require(jsonlite)
dat <- tryCatch(
{
fromJSON(file, flatten=TRUE)
},
error = function(cond) {
message(cond)
return(NA)
},
warning = function(cond) {
message(cond)
return(NULL)
}
)
return(dat)
}
然后我对字符向量 files 调用parLapply,该向量包含JSON文件的完整路径:
and then I call parLapply on a character vector files which contains the full paths to the JSON files:
dat<- parLapply(cl,files,readJson)
到达未正确结束的文件并且通过跳过有问题的文件而不创建列表"dat"时产生错误.这是readJson函数应该减轻的功能.
that produces an error when it reaches a file that doesn't end properly and does not create the list 'dat' by skipping over the problematic file. Which is what the readJson function was supposed to mitigate.
当我使用普通的lapply时,它工作得很好.它会产生错误,但是仍然会跳过错误的文件来创建列表.
When I use regular lapply, however it works perfectly fine. It generates the errors, however, it still creates the list by skipping over the erroneous file.
关于如何在parLappy并行中使用异常处理的任何想法,这样它将跳过有问题的文件并生成列表?
any ideas on how I could use exception handling with parLappy parallel such that it will skip over the problematic files and generate the list?
推荐答案
在您的error
处理函数中,cond
是错误情况. message(cond)
发出信号通知此状况,该状况已被工人捕获并作为错误传递给主机.删除message
调用或将其替换为类似message(conditionMessage(cond))
不过,您不会在主服务器上看到任何东西,因此最好将其删除.
In your error
handler function cond
is an error condition. message(cond)
signals this condition, which is caught on the workers and transmitted as an error to the master. Either remove the message
calls or replace them with something likemessage(conditionMessage(cond))
You won't see anything on the master though, so removing is probably best.
这篇关于在R中使用parLapply(并行包)进行TryCatch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!