对TF和TF * IDF向量执行Chi-2特征选择

本文介绍了对TF和TF * IDF向量执行Chi-2特征选择的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试选择Chi-2功能来执行一些文本分类任务.我知道Chi-2测试会检查B/T两个 categorical 变量的依存关系，因此，如果我们针对具有二进制BOW矢量表示的二进制文本分类问题执行Chi-2特征选择，则每个Chi-2测试在每个(特征，类)对上，这将是一个非常简单的具有2个自由度的Chi-2检验.

I'm experimenting with Chi-2 feature selection for some text classification tasks.I understand that Chi-2 test checks the dependencies B/T two categorical variables, so if we perform Chi-2 feature selection for a binary text classification problem with binary BOW vector representation, each Chi-2 test on each (feature,class) pair would be a very straightforward Chi-2 test with 1 degree of freedom.

从文档中引用: http ://scikit-learn.org/stable/modules/generation/sklearn.feature_selection.chi2.html#sklearn.feature_selection.chi2 ，

在我看来，我们也可以对DF(字数)矢量表示进行Chi-2特征选择.我的第一个问题是:sklearn如何将整数值特征离散化为分类?

It seems to me that we we can also perform Chi-2 feature selection on DF (word counts) vector presentation. My 1st question is: how does sklearn discretize the integer-valued feature into categorical?

我的第二个问题类似于第一个问题.从此处的演示代码中: http://scikit-learn.sourceforge.net/dev /auto_examples/document_classification_20newsgroups.html

My second question is similar to the first. From the demo codes here: http://scikit-learn.sourceforge.net/dev/auto_examples/document_classification_20newsgroups.html

在我看来，我们也可以对TF * IDF矢量表示执行Chi-2特征选择. sklearn如何对实值特征执行Chi-2特征选择?

It seems to me that we can also perform Chi-2 feature selection on a TF*IDF vector representation. How sklearn perform Chi-2 feature selection on real-valued features?

在此先感谢您的宝贵意见！

Thank you in advance for your kind advise!

chi

对TF和TF * IDF向量执行Chi-2特征选择

问题描述

推荐答案