当与foreach并行预测nnet输出时，R内存耗尽

本文介绍了当与foreach并行预测nnet输出时，R内存耗尽的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个(大型)神经网络，它由R中的nnet软件包进行训练.我希望能够模拟来自该神经网络的预测，并以并行方式使用诸如foreach之类的东西来进行，我已经使用过成功之前(全部在Windows计算机上).

I have a (large) neural net being trained by the nnet package in R. I want to be able to simulate predictions from this neural net, and do so in a parallelised fashion using something like foreach, which I've used before with success (all on a Windows machine).

我的代码本质上是这样的形式

My code is essentially of the form

library(nnet)

data = data.frame(out=c(0, 0.1, 0.4, 0.6),
              in1=c(1, 2, 3, 4),
              in2=c(10, 4, 2, 6))

net = nnet(out ~ in1 + in2, data=data, size=5)

library(doParallel)
registerDoParallel(cores=detectCores()-2)

results = foreach(test=1:10, .combine=rbind, .packages=c("nnet")) %dopar% {
  result = predict(net, newdata = data.frame(in1=test, in2=5))
  return(result)
}

除外，其中较大的NN可以拟合并从中进行预测；大约有300MB.

except with a much larger NN being fit and predicted from; it's around 300MB.

使用传统的for循环或使用％do％时，上面的代码运行良好，但是使用％dopar％时，每个使用的内核都会将所有内容加载到内存中-每个内核大约700MB.如果我运行足够长的时间，一切最终都会爆炸.

The code above runs fine when using a traditional for loop, or when using %do%, but when using %dopar%, everything gets loaded into memory for each core being used - around 700MB each. If I run it for long enough, everything eventually explodes.

查找了类似的问题，我仍然不知道是什么原因造成的.省略预测"部分可以使所有操作顺利进行.

Having looked up similar problems, I still have no idea what is causing this. Omitting the 'predict' part has everything run smoothly.

我如何让每个核心查找不变的网络"，而不是将其加载到内存中?还是不可能?

How can I have each core lookup the unchanging 'net' rather than having it loaded into memory? Or is it not possible?

推荐答案

CPak的回复解释了发生了什么；您实际上是在单独的R会话中运行主脚本的多个副本(=工作者).由于您使用的是Windows，因此请致电

CPak's reply explains what's going on; you're effectively running multiple copies (=workers) of the main script in separate R session. Since you're on Windows, calling

registerDoParallel(cores = n)

扩展到:

cl <- parallel::makeCluster(n, type = "PSOCK")
registerDoParallel(cl)

通过什么设置n 独立背景 R工作者及其自己的独立内存地址空间.

which what sets up n independent background R workers with their own indenpendent memory address spaces.

现在，如果您使用的是类似Unix的系统，那么它将相当于使用n forked R worker，请参见. parallel::mclapply(). Windows上的R不支持分支的进程.使用分叉处理，您将有效地获得所需的信息，因为分叉的子进程将共享主进程已经分配的对象(只要不修改此类对象)，例如net.

Now, if you'd been on a Unix-like system, it would instead have corresponded to using n forked R workers, cf. parallel::mclapply(). Forked processes are not supported by R on Windows. With forked processing, you would effectively get what you're asking for, because forked child processes will share the objects already allocated by the main process (as long as such objects are not modified), e.g. net.

这篇关于当与foreach并行预测nnet输出时，R内存耗尽的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！