问题描述
当选择主成分(k)的数目,我们选择k以可使得例如,方差为99%,被保持在最小值.
When choosing the number of principal components (k), we choose k to be the smallest value so that for example, 99% of variance, is retained.
但是,在Python Scikit学习中,我不是100%确定pca.explained_variance_ratio_ = 0.99
等于保留了99%的方差"吗?有人可以开导吗?谢谢.
However, in the Python Scikit learn, I am not 100% sure pca.explained_variance_ratio_ = 0.99
is equal to "99% of variance is retained"? Could anyone enlighten? Thanks.
- Python Scikit学习PCA手册在这里
推荐答案
是的,您几乎是正确的. pca.explained_variance_ratio_
参数返回每个维度说明的方差矢量.因此,pca.explained_variance_ratio_[i]
给出了仅由第i + 1维解释的方差.
Yes, you are nearly right. The pca.explained_variance_ratio_
parameter returns a vector of the variance explained by each dimension. Thus pca.explained_variance_ratio_[i]
gives the variance explained solely by the i+1st dimension.
您可能想执行pca.explained_variance_ratio_.cumsum()
.这将返回向量x
,以使x[i]
返回由前i + 1个维度解释的累积方差.
You probably want to do pca.explained_variance_ratio_.cumsum()
. That will return a vector x
such that x[i]
returns the cumulative variance explained by the first i+1 dimensions.
import numpy as np
from sklearn.decomposition import PCA
np.random.seed(0)
my_matrix = np.random.randn(20, 5)
my_model = PCA(n_components=5)
my_model.fit_transform(my_matrix)
print my_model.explained_variance_
print my_model.explained_variance_ratio_
print my_model.explained_variance_ratio_.cumsum()
[ 1.50756565 1.29374452 0.97042041 0.61712667 0.31529082]
[ 0.32047581 0.27502207 0.20629036 0.13118776 0.067024 ]
[ 0.32047581 0.59549787 0.80178824 0.932976 1. ]
所以在我的随机玩具数据中,如果我选择k=4
,我将保留93.3%的差异.
So in my random toy data, if I picked k=4
I would retain 93.3% of the variance.
这篇关于Python scikit学习pca.explained_variance_ratio_截止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!