问题描述
我正在R中使用princomp
来执行PCA.我的数据矩阵很大(10K x 10K,每个值最多4个小数点).在Xeon 2.27 GHz处理器上需要约3.5个小时和约6.5 GB的物理内存.
I am using princomp
in R to perform PCA. My data matrix is huge (10K x 10K with each value up to 4 decimal points). It takes ~3.5 hours and ~6.5 GB of Physical memory on a Xeon 2.27 GHz processor.
由于我只想要前两个组件,有没有更快的方法呢?
Since I only want the first two components, is there a faster way to do this?
更新:
除了速度之外,还有一种高效的内存存储方式吗?
In addition to speed, Is there a memory efficient way to do this ?
使用svd(,2,)
计算头两个组件需要花费大约2个小时和大约6.3 GB的物理内存.
It takes ~2 hours and ~6.3 GB of physical memory for calculating first two components using svd(,2,)
.
推荐答案
您有时可以使用所谓的经济"分解,从而可以限制特征值/特征向量的数量.似乎eigen()
和prcomp()
没有提供此功能,但是svd()
允许您指定要计算的最大数字.
You sometimes gets access to so-called 'economical' decompositions which allow you to cap the number of eigenvalues / eigenvectors. It looks like eigen()
and prcomp()
do not offer this, but svd()
allows you to specify the maximum number to compute.
在小型矩阵上,增益似乎很小:
On small matrices, the gains seem modest:
R> set.seed(42); N <- 10; M <- matrix(rnorm(N*N), N, N)
R> library(rbenchmark)
R> benchmark(eigen(M), svd(M,2,0), prcomp(M), princomp(M), order="relative")
test replications elapsed relative user.self sys.self user.child
2 svd(M, 2, 0) 100 0.021 1.00000 0.02 0 0
3 prcomp(M) 100 0.043 2.04762 0.04 0 0
1 eigen(M) 100 0.050 2.38095 0.05 0 0
4 princomp(M) 100 0.065 3.09524 0.06 0 0
R>
但是从svd()
重构princomp()
时,相对于princomp()
的3倍可能值得,因为svd()
允许您在两个值之后停止.
but the factor of three relative to princomp()
may be worth your while reconstructing princomp()
from svd()
as svd()
allows you to stop after two values.
这篇关于计算R中前两个主成分的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!