问题描述
我正在使用sklearn包的KNN分类器处理数字数据集.
I am working on a numerical dataset using KNN Classifier of sklearn package.
预测完成后,应在条形图中显示前四个重要变量.
Once the prediction is complete, the top 4 important variables should be displayed in a bar graph.
这是我尝试过的解决方案,但会引发错误,提示feature_importances不是KNNClassifier的属性:
Here is the solution I have tried, but it throws an error that feature_importances is not an attribute of KNNClassifier:
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X_train, y_train)
y_pred = neigh.predict(X_test)
(pd.Series(neigh.feature_importances_, index=X_test.columns)
.nlargest(4)
.plot(kind='barh'))
现在显示决策树的可变重要性图:传递给pd.series()的参数为classifier.feature_importances _
Now to display the variable importance graph for decision tree: the argument passed to pd.series() is classifier.feature_importances_
对于SVM,线性判别分析传递给pd.series()的参数为classifier.coef_ [0].
For SVM, Linear discriminant analysis the argument passed to pd.series() is classifier.coef_[0].
但是,我找不到适合KNN分类器的参数.
However, I am unable to find a suitable argument for KNN classifier.
推荐答案
Gere是一个很好的通用示例.
Gere is a good, and generic, example.
#importing libraries
from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.linear_model import RidgeCV, LassoCV, Ridge, Lasso#Loading the dataset
x = load_boston()
df = pd.DataFrame(x.data, columns = x.feature_names)
df["MEDV"] = x.target
X = df.drop("MEDV",1) #Feature Matrix
y = df["MEDV"] #Target Variable
df.head()
reg = LassoCV()
reg.fit(X, y)
print("Best alpha using built-in LassoCV: %f" % reg.alpha_)
print("Best score using built-in LassoCV: %f" %reg.score(X,y))
coef = pd.Series(reg.coef_, index = X.columns)
print("Lasso picked " + str(sum(coef != 0)) + " variables and eliminated the other " + str(sum(coef == 0)) + " variables")
imp_coef = coef.sort_values()
import matplotlib
matplotlib.rcParams['figure.figsize'] = (8.0, 10.0)
imp_coef.plot(kind = "barh")
plt.title("Feature importance using Lasso Model")
下面列出了所有详细信息.
All details are listed below.
https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b
这里是另外两个很棒的例子.
Here are two more great examples of the same.
https://www.scikit-yb.org /en/latest/api/features/importances.html
https://github.com/WillKoehrsen /feature-selector/blob/master/Feature%20Selector%20Usage.ipynb
这篇关于如何为KNNClassifier()查找“特征重要性"或可变重要性图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!