我一直在为XGBClassifier的这种古怪行为而头疼,XGBClassifier应该像RandomForestClassifier那样表现得很好:
import xgboost as xgb
from sklearn.ensemble import RandomForestClassifier
class my_rf(RandomForestClassifier):
def important_features(self, X):
return super(RandomForestClassifier, self).feature_importances_
class my_xgb(xgb.XGBClassifier):
def important_features(self, X):
return super(xgb.XGBClassifier, self).feature_importances_
c1 = my_rf()
c1.fit(X,y)
c1.important_features(X) #works
当此代码失败时:(
c2 = my_xgb()
c2.fit(X,y)
c2.important_features(X) #fails with AttributeError: 'super' object has no attribute 'feature_importances_'
我盯着两个代码位看,它们看起来都一样我错过了什么??
抱歉,如果这是noob,python OOP的奥秘就在我的后面。
rf-code
xgb-code
编辑:
如果我使用Vanilla XGB,而不继承,那么一切都很好:
import xgboost as xgb
print "version:", xgb.__version__
c = xgb.XGBClassifier()
c.fit(X_train.as_matrix(), y_train.label)
print c.feature_importances_[:5]
version: 0.4
[ 0.4039548 0.05932203 0.06779661 0.00847458 0. ]
最佳答案
据我所知,feature_importances_
不是在xgboost中实现的。您可以使用排列功能重要性这样的工具滚动自己的文件:
import random
from sklearn.cross_validation import cross_val_score
def feature_importances(clf, X, y):
score = np.mean(cross_val_score(clf, X,y,scoring='roc_auc'))
importances = {}
for i in range(X.shape[1]):
X_perm = X.copy()
X_perm[:,i] = random.sample(X[:,i].tolist(), X.shape[0])
perm_score = np.mean(cross_val_score(clf, X_perm , y, scoring='roc_auc'))
importances[i] = score - perm_score
return importances