问题描述
我正在尝试使用 PCA://www.mathworks.com/help/stats/princomp.html" rel="nofollow noreferrer">princomp(x)
,已标准化.
I'm trying to apply PCA on my data using princomp(x)
, that has been standardized.
数据为.这运行了我们的内存,这在意料之中,除了这是一台新计算机,该计算机拥有 24GB 的 RAM 用于数据挖掘.MATLAB 甚至在内存检查中列出了 24GB 可用空间.
The data is <16 x 1036800 double>
. This runs our of memory which is too be expected except for the fact that this is a new computer, the computer holds 24GB of RAM for data mining. MATLAB even lists the 24GB available on a memory check.
MATLAB 在执行 PCA 时是否真的耗尽了内存,还是 MATLAB 没有充分利用 RAM?任何信息或想法都会有所帮助.(我可能需要增加虚拟内存,但假设 24GB 就足够了.)
Is MATLAB actually running out of memory while performing a PCA or is MATLAB not using the RAM to it's full potential? Any information or ideas would be helpful. (I may need to increase the virtual memory but assumed the 24GB would have sufficed.)
推荐答案
对于大小为 n-by-p 的数据矩阵,PRINCOMP
将返回大小为 p-by-p 的系数矩阵,其中每列都是使用原始维度表示的主成分,因此在您的情况下,您将创建一个大小的输出矩阵:
For a data matrix of size n-by-p, PRINCOMP
will return a coefficient matrix of size p-by-p where each column is a principal component expressed using the original dimensions, so in your case you will create an output matrix of size:
1036800*1036800*8 bytes ~ 7.8 TB
考虑使用 PRINCOMP(X,'econ')
只返回具有显着差异的 PC
Consider using PRINCOMP(X,'econ')
to return only the PCs with significant variance
或者,考虑执行 PCA by SVD:在您的情况下 n<<p
,协方差矩阵无法计算.因此,不分解p×p矩阵XX'
,只分解较小的n×n矩阵X'X
就足够了.参考这篇论文以供参考.
Alternatively, consider performing PCA by SVD: in your case n<<p
, and the covariance matrix is impossible to compute. Therefore, instead of decomposing the p-by-p matrix XX'
, it is sufficient to only decompose the smaller n-by-n matrix X'X
. Refer to this paper for reference.
这是我的实现,此函数的输出与 PRINCOMP 的输出相匹配(还是前三个):
Here's my implementation, the outputs of this function match those of PRINCOMP (the first three anyway):
function [PC,Y,varPC] = pca_by_svd(X)
% PCA_BY_SVD
% X data matrix of size n-by-p where n<<p
% PC columns are first n principal components
% Y data projected on those PCs
% varPC variance along the PCs
%
X0 = bsxfun(@minus, X, mean(X,1)); % shift data to zero-mean
[U,S,PC] = svd(X0,'econ'); % SVD decomposition
Y = X0*PC; % project X on PC
varPC = diag(S'*S)' / (size(X,1)-1); % variance explained
end
我刚刚在我的 4GB 机器上尝试过,它运行得很好:
I just tried it on my 4GB machine, and it ran just fine:
» x = rand(16,1036800);
» [PC, Y, varPC] = pca_by_svd(x);
» whos
Name Size Bytes Class Attributes
PC 1036800x16 132710400 double
Y 16x16 2048 double
varPC 1x16 128 double
x 16x1036800 132710400 double
更新:
princomp
函数被弃用,取而代之的是 pca
在 R2012b 中引入,其中包含更多选项.
Update:
The princomp
function became deprecated in favor of pca
introduced in R2012b, which includes many more options.
这篇关于MATLAB 内存不足,但不应如此的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!