我一直在为XGBClassifier的这种古怪行为而头疼,XGBClassifier应该像RandomForestClassifier那样表现得很好:

import xgboost as xgb
from sklearn.ensemble import RandomForestClassifier

class my_rf(RandomForestClassifier):
    def important_features(self, X):
        return super(RandomForestClassifier, self).feature_importances_

class my_xgb(xgb.XGBClassifier):
    def important_features(self, X):
        return super(xgb.XGBClassifier, self).feature_importances_

c1 = my_rf()
c1.fit(X,y)
c1.important_features(X) #works

当此代码失败时:(
c2 = my_xgb()
c2.fit(X,y)
c2.important_features(X) #fails with AttributeError: 'super' object has no attribute 'feature_importances_'

我盯着两个代码位看,它们看起来都一样我错过了什么??
抱歉,如果这是noob,python OOP的奥秘就在我的后面。
rf-code
xgb-code
编辑:
如果我使用Vanilla XGB,而不继承,那么一切都很好:
import xgboost as xgb
print "version:", xgb.__version__
c = xgb.XGBClassifier()
c.fit(X_train.as_matrix(), y_train.label)
print c.feature_importances_[:5]

version: 0.4
[ 0.4039548   0.05932203  0.06779661  0.00847458  0.        ]

最佳答案

据我所知,feature_importances_不是在xgboost中实现的。您可以使用排列功能重要性这样的工具滚动自己的文件:

import random
from sklearn.cross_validation import cross_val_score

def feature_importances(clf, X, y):
    score = np.mean(cross_val_score(clf, X,y,scoring='roc_auc'))
    importances = {}
    for i in range(X.shape[1]):
        X_perm = X.copy()
        X_perm[:,i] = random.sample(X[:,i].tolist(), X.shape[0])
        perm_score = np.mean(cross_val_score(clf, X_perm , y, scoring='roc_auc'))
        importances[i] = score - perm_score

    return importances

10-01 03:41