本文介绍了PCA是否具有分类功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,我认为PCA仅可用于连续功能.但是,在尝试了解onehot编码和标签编码之间的区别时,是通过以下链接中的帖子获得的:

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the following link:

何时使用一次热编码与LabelEncoder与DictVectorizo​​r?

它指出,在PCA之后进行热编码是一种非常好的方法,这基本上意味着PCA已应用于分类特征.因此感到困惑,请在同一位置建议我.

It states that one hot encoding followed by PCA is a very good method, which basically means PCA is applied for categorical features.Hence confused, please suggest me on the same.

推荐答案

我不同意其他观点.

您可以在二进制数据上使用PCA (例如,单热编码数据),但这并不意味着这是一件好事,否则它会很好地工作.

While you can use PCA on binary data (e.g. one-hot encoded data) that does not mean it is a good thing, or it will work very well.

PCA用于连续变量.它试图最小化方差(=平方偏差).拥有二进制变量时,平方差的概念会破裂.

PCA is desinged for continuous variables. It tries to minimize variance (=squared deviations). The concept of squared deviations breaks down when you have binary variables.

是的,您可以使用PCA.是的,您会得到一个输出.它甚至是最小二乘的输出-好像PCA不会对此类数据进行分段处理.它可以工作,但是有意义比您想要的要少得多;并且据认为没有例如频繁的模式挖掘.

So yes, you can use PCA. And yes, you get an output. It even is a least-squared output - it's not as if PCA would segfault on such data. It works, but it is just much less meaningful than you'd want it to be; and supposedly less meaningful than e.g. frequent pattern mining.

这篇关于PCA是否具有分类功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:09
查看更多