本文介绍了scikit学习PCA降维-数据很多功能和少量样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用scikit-learn的PCA进行尺寸缩减.我的数据集包含大约300个样本和4096个功能.我想将尺寸减小到400和40.但是,当我调用该算法时,结果数据确实具有最多样本数"功能.

I am trying to do a dimension reduction using PCA from scikit-learn. My data set has around 300 samples and 4096 features. I want to reduce the dimensions to 400 and 40. But when I call the algorithm the resulting data does have at most "number of samples" features.

from sklearn.decomposition import PCA

pca = PCA(n_components = 400)
trainData = pca.fit_transform(trainData)
testData = pca.transform(testData)

trainData的初始形状为300x4096,结果数据形状为300x300.有什么方法可以对这种数据(功能很多,样本很少)执行此操作?

Where initial shape of trainData is 300x4096 and the resulting data shape is 300x300. Is there any way to perform this operation on this kind of data (lot of features, few samples)?

推荐答案

可以从M x N数据集中提取的主成分的最大数量为min(M,N).它不是算法问题.从根本上讲,这是最大数量.

The maximum number of principal components that can be extracted from and M x N dataset is min(M, N). Its not an algorithm issue. Fundamentally, that is the maximum number that there are.

这篇关于scikit学习PCA降维-数据很多功能和少量样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:36