R中几个big.matrix对象的按元素均值

本文介绍了R中几个big.matrix对象的按元素均值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有17个文件支持的big.matrix对象(dim 10985 x 52598，每个4.3GB)，我想计算其中的元素均值.结果可以存储在另一个big.matrix(gcm.res.outputM)中.

I have 17 filebacked big.matrix objects (dim 10985 x 52598, 4.3GB each) of which I would like to calculate the element-wise mean. The result can be stored in another big.matrix (gcm.res.outputM).

biganalytics :: apply()无效，因为MARGIN只能设置为1或2.我尝试将2用于循环，如此处所示

biganalytics::apply() doesn't work as the MARGIN can be set to 1 OR 2 only. I tried to use 2 for loops as shown here

gcm.res.outputM <- filebacked.big.matrix(10958, 52598, separated = FALSE, backingfile = "gcm.res.outputM.bin", backingpath = NULL, descriptorfile = "gcm.res.outputM.desc", binarydescriptor = FALSE)

for(i in 1:10958){
   for(j in 1:52598){
    t <- rbind(gcm.res.output1[i,j], gcm.res.output2[i,j],gcm.res.output3[i,j], gcm.res.output4[i,j],
           gcm.res.output5[i,j], gcm.res.output6[i,j],gcm.res.output7[i,j], gcm.res.output8[i,j],
           gcm.res.output9[i,j], gcm.res.output10[i,j],gcm.res.output11[i,j], gcm.res.output12[i,j],
           gcm.res.output13[i,j], gcm.res.output14[i,j],gcm.res.output15[i,j], gcm.res.output16[i,j],
           gcm.res.output17[i,j])
    tM <- apply(t, 2, mean, na.rm = TRUE)
    gcm.res.outputM[i,j] <- tM
    }
}

每行i大约需要1.5分钟，因此大约需要运行11天.

which will take around 1.5 minutes per row i and thus about 11 days run.

有人对如何加快计算速度有任何想法吗?我正在使用具有16GB RAM的64x Windows10计算机.

Does anyone have any ideas on how to speed up this calculation? I'm using a 64x Windows10 machine with 16GB of RAM.

谢谢！

推荐答案

您可以使用以下Rcpp代码:

You can use this Rcpp code:

// [[Rcpp::depends(BH, bigmemory, RcppEigen)]]
#include <bigmemory/MatrixAccessor.hpp>
#include <RcppEigen.h>
using namespace Eigen;
using namespace Rcpp;

// [[Rcpp::export]]
void add_to(XPtr<BigMatrix> xptr_from, XPtr<BigMatrix> xptr_to) {

  Map<MatrixXd> bm_from((double *)xptr_from->matrix(),
                        xptr_from->nrow(), xptr_from->ncol());
  Map<MatrixXd> bm_to((double *)xptr_to->matrix(),
                      xptr_to->nrow(), xptr_to->ncol());

  bm_to += bm_from;
}

// [[Rcpp::export]]
void div_by(XPtr<BigMatrix> xptr, double val) {

  Map<MatrixXd> bm((double *)xptr->matrix(),
                   xptr->nrow(), xptr->ncol());

  bm /= val;
}

然后，如果您具有相同大小的big.matrix对象的列表，则可以执行以下操作:

Then if you have a list of big.matrix objects of the same size, you can do:

library(bigmemory)
bm_list <- lapply(1:5, function(i) big.matrix(1000, 500, init = i))
res <- deepcopy(bm_list[[1]])
lapply(bm_list[-1], function(bm) add_to(bm@address, res@address))
res[1:5, 1:5]  # verif
div_by(res@address, length(bm_list))
res[1:5, 1:5]  # verif

这篇关于R中几个big.matrix对象的按元素均值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！