问题描述
我现在关注下一个主题:现在,我们使用PCA方法在Python中分解数据集,并为此使用sklearn.decomposition.PCA
和属性components_
来获取所有组件.现在,我们有一个非常相似的目标:只想只采用几个组件(这不是问题),然后看看每个PCA组件的输入要素比例是多少(要知道,哪些要素对我们非常重要).怎么可能呢?另一个问题是,python库是否具有Principal Component Analysis的其他实现?
I'm following now next topic: How can I use PCA/SVD in Python for feature selection AND identification?Now, we decompose our data set in Python with PCA method and use for this the sklearn.decomposition.PCA
With the usage of attributes components_
we get all components. Now we have very similar goal: want take only first several components (this part is not a problem) and see, what the input features proportions has every PCA component (to know, which features are much important for us). How is possible to do it?Another question is, has the python lybrary another implementations of Principal Component Analysis?
推荐答案
components_
数组的形状为(n_components, n_features)
,因此components_[i, j]
已经为您提供了特征j
对分量i
的贡献的(有符号)权重.
The components_
array has shape (n_components, n_features)
so components_[i, j]
is already giving you the (signed) weights of the contribution of feature j
to component i
.
如果要获取构成分量i
的前3个特征的索引,而与符号无关,则可以执行以下操作:
If you want to get the indices of the top 3 features contributing to component i
irrespective of the sign, you can do:
numpy.abs(pca.component_[i]).argsort()[::-1][:3]
注意:[::-1]
表示法可以反转数组的顺序:
Note: the [::-1]
notation makes it possible to reverse the order of an array:
>>> import numpy as np
>>> np.array([1, 2, 3])[::-1]
array([3, 2, 1])
PCA只是中心数据集的截断奇异值分解.您可以根据需要直接使用numpy.linalg.svd
.看看的scikit-learn实现的源代码PCA 以获取详细信息.
PCA is just a truncated Singular Value Decomposition of the centered dataset. You can use numpy.linalg.svd
directly if you wish. Have a look at the soure code of the scikit-learn implementation of PCA for details.
这篇关于使用Python进行PCA分解:功能相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!