计算R中前两个主成分的最快方法是什么?

本文介绍了计算R中前两个主成分的最快方法是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在R中使用princomp来执行PCA.我的数据矩阵很大(10K x 10K，每个值最多4个小数点).在Xeon 2.27 GHz处理器上需要约3.5个小时和约6.5 GB的物理内存.

I am using princomp in R to perform PCA. My data matrix is huge (10K x 10K with each value up to 4 decimal points). It takes ~3.5 hours and ~6.5 GB of Physical memory on a Xeon 2.27 GHz processor.

由于我只想要前两个组件，有没有更快的方法呢?

Since I only want the first two components, is there a faster way to do this?

更新:

除了速度之外，还有一种高效的内存存储方式吗?

In addition to speed, Is there a memory efficient way to do this ?

使用svd(,2,)计算头两个组件需要花费大约2个小时和大约6.3 GB的物理内存.

It takes ~2 hours and ~6.3 GB of physical memory for calculating first two components using svd(,2,).

推荐答案

您有时可以使用所谓的经济"分解，从而可以限制特征值/特征向量的数量.似乎eigen()和prcomp()没有提供此功能，但是svd()允许您指定要计算的最大数字.

You sometimes gets access to so-called 'economical' decompositions which allow you to cap the number of eigenvalues / eigenvectors. It looks like eigen() and prcomp() do not offer this, but svd() allows you to specify the maximum number to compute.

在小型矩阵上，增益似乎很小:

On small matrices, the gains seem modest:

R> set.seed(42); N <- 10; M <- matrix(rnorm(N*N), N, N)
R> library(rbenchmark)
R> benchmark(eigen(M), svd(M,2,0), prcomp(M), princomp(M), order="relative")
          test replications elapsed relative user.self sys.self user.child
2 svd(M, 2, 0)          100   0.021  1.00000      0.02        0          0
3    prcomp(M)          100   0.043  2.04762      0.04        0          0
1     eigen(M)          100   0.050  2.38095      0.05        0          0
4  princomp(M)          100   0.065  3.09524      0.06        0          0
R>

但是从svd()重构princomp()时，相对于princomp()的3倍可能值得，因为svd()允许您在两个值之后停止.

but the factor of three relative to princomp() may be worth your while reconstructing princomp() from svd() as svd() allows you to stop after two values.

这篇关于计算R中前两个主成分的最快方法是什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！