python - Sklearn Chi2用于功能选择

我正在学习用于功能选择的chi2，遇到了this
然而，我对chi2的理解是，分数越高，表示特征越独立（因此对模型的用处越小），因此我们对分数最低的特征感兴趣。但是，使用scikit学习SelectKBest，选择器返回具有最高chi2分数的值。我对使用chi2测试的理解是否不正确？或者在sklearn中的chi2分数产生的不是chi2统计数据吗？
我的意思见下面的代码（除了结尾，大部分都是从上面的链接复制过来的）

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
import pandas as pd
import numpy as np

# Load iris data
iris = load_iris()

# Create features and target
X = iris.data
y = iris.target

# Convert to categorical data by converting data to integers
X = X.astype(int)

# Select two features with highest chi-squared statistics
chi2_selector = SelectKBest(chi2, k=2)
chi2_selector.fit(X, y)

# Look at scores returned from the selector for each feature
chi2_scores = pd.DataFrame(list(zip(iris.feature_names, chi2_selector.scores_, chi2_selector.pvalues_)), columns=['ftr', 'score', 'pval'])
chi2_scores

# you can see that the kbest returned from SelectKBest
#+ were the two features with the _highest_ score
kbest = np.asarray(iris.feature_names)[chi2_selector.get_support()]
kbest

最佳答案

你的理解颠倒了。
chi2检验的零假设是“两个分类变量是独立的”。因此，较高的chi2统计量值意味着“两个分类变量是相依的”，对分类更有用。
selectkbest基于较高的chi2值为您提供了最好的两个（k=2）功能。因此，您需要获取它提供的那些特性，而不是获取chi2选择器上的“其他特性”。
从chi2_selector.scores获得chi2统计信息是正确的，从chi2_selector.get_support（）获得最佳功能也是正确的。它将给你“花瓣长度（cm）”和“花瓣宽度（cm）”作为基于独立性测试的chi2测试的前两个特征。希望能澄清这个算法。