本文介绍了MATLAB 内存不足,但不应如此的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 PCA://www.mathworks.com/help/stats/princomp.html" rel="nofollow noreferrer">princomp(x),已标准化.

I'm trying to apply PCA on my data using princomp(x), that has been standardized.

数据为.这运行了我们的内存,这在意料之中,除了这是一台新计算机,该计算机拥有 24GB 的 RAM 用于数据挖掘.MATLAB 甚至在内存检查中列出了 24GB 可用空间.

The data is <16 x 1036800 double>. This runs our of memory which is too be expected except for the fact that this is a new computer, the computer holds 24GB of RAM for data mining. MATLAB even lists the 24GB available on a memory check.

MATLAB 在执行 PCA 时是否真的耗尽了内存,还是 MATLAB 没有充分利用 RAM?任何信息或想法都会有所帮助.(我可能需要增加虚拟内存,但假设 24GB 就足够了.)

Is MATLAB actually running out of memory while performing a PCA or is MATLAB not using the RAM to it's full potential? Any information or ideas would be helpful. (I may need to increase the virtual memory but assumed the 24GB would have sufficed.)

推荐答案

对于大小为 n-by-p 的数据矩阵,PRINCOMP 将返回大小为 p-by-p 的系数矩阵,其中每列都是使用原始维度表示的主成分,因此在您的情况下,您将创建一个大小的输出矩阵:

For a data matrix of size n-by-p, PRINCOMP will return a coefficient matrix of size p-by-p where each column is a principal component expressed using the original dimensions, so in your case you will create an output matrix of size:

1036800*1036800*8 bytes ~ 7.8 TB

考虑使用 PRINCOMP(X,'econ') 只返回具有显着差异的 PC

Consider using PRINCOMP(X,'econ') to return only the PCs with significant variance

或者,考虑执行 PCA by SVD:在您的情况下 n<<p,协方差矩阵无法计算.因此,不分解p×p矩阵XX',只分解较小的n×n矩阵X'X就足够了.参考这篇论文以供参考.

Alternatively, consider performing PCA by SVD: in your case n<<p, and the covariance matrix is impossible to compute. Therefore, instead of decomposing the p-by-p matrix XX', it is sufficient to only decompose the smaller n-by-n matrix X'X. Refer to this paper for reference.

这是我的实现,此函数的输出与 PRINCOMP 的输出相匹配(还是前三个):

Here's my implementation, the outputs of this function match those of PRINCOMP (the first three anyway):

function [PC,Y,varPC] = pca_by_svd(X)
    % PCA_BY_SVD
    %   X      data matrix of size n-by-p where n<<p
    %   PC     columns are first n principal components
    %   Y      data projected on those PCs
    %   varPC  variance along the PCs
    %

    X0 = bsxfun(@minus, X, mean(X,1));     % shift data to zero-mean
    [U,S,PC] = svd(X0,'econ');             % SVD decomposition
    Y = X0*PC;                             % project X on PC
    varPC = diag(S'*S)' / (size(X,1)-1);   % variance explained
end

我刚刚在我的 4GB 机器上尝试过,它运行得很好:

I just tried it on my 4GB machine, and it ran just fine:

» x = rand(16,1036800);
» [PC, Y, varPC] = pca_by_svd(x);
» whos
  Name             Size                     Bytes  Class     Attributes

  PC         1036800x16                 132710400  double
  Y               16x16                      2048  double
  varPC            1x16                       128  double
  x               16x1036800            132710400  double

更新:

princomp 函数被弃用,取而代之的是 pca 在 R2012b 中引入,其中包含更多选项.


Update:

The princomp function became deprecated in favor of pca introduced in R2012b, which includes many more options.

这篇关于MATLAB 内存不足,但不应如此的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 05:02