比赛得分公式如下:
其中,P为Precision , R为 Recall。
GBDT训练基于验证集评价,此时会调用评价函数,XGBoost的best_iteration和best_score均是基于评价函数得出。
评价函数:
input: preds和dvalid,即为验证集和验证集上的预测值,
return string 类型的名称 和一个flaot类型的fevalerror值表示评价值的大小,其是以error的形式定义,即当此值越大是认为模型效果越差。
from sklearn.metrics import confusion_matrix
def customedscore(preds, dtrain):
label = dtrain.get_label()
pred = [int(i>=0.5) for i in preds]
confusion_matrixs = confusion_matrix(label, pred)
recall =float(confusion_matrixs[0][0]) / float(confusion_matrixs[0][1]+confusion_matrixs[0][0])
precision = float(confusion_matrixs[0][0]) / float(confusion_matrixs[1][0]+confusion_matrixs[0][0])
F = 5*precision* recall/(2*precision+3*recall)*100
return 'FSCORE',float(F)
应用:
训练时要传入参数:feval = customedscore,
params = { 'silent': 1, 'objective': 'binary:logistic' , 'gamma':0.1,
'min_child_weight':5,
'max_depth':5,
'lambda':10,
'subsample':0.7,
'colsample_bytree':0.7,
'colsample_bylevel':0.7,
'eta': 0.01,
'tree_method':'exact'}
model = xgb.train(params, trainsetall, num_round,verbose_eval=10, feval = customedscore,maximize=False)
自定义 目标函数,这个我没有具体使用
# user define objective function, given prediction, return gradient and second order gradient
# this is log likelihood loss
def logregobj(preds, dtrain):
labels = dtrain.get_label()
preds = 1.0 / (1.0 + np.exp(-preds))
grad = preds - labels
hess = preds * (1.0-preds)
return grad, hess
# training with customized objective, we can also do step by step training
# simply look at xgboost.py's implementation of train
bst = xgb.train(param, dtrain, num_round, watchlist, logregobj, evalerror)
参考:
https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py
http://blog.csdn.net/lujiandong1/article/details/52791117