问题描述
我正在处理类别不平衡(5%1)的分类问题.我想预测班级,而不是概率.
I'm working on a classification problem with unbalanced classes (5% 1's). I want to predict the class, not the probability.
在二进制分类问题中,默认情况下scikit的classifier.predict()
是否使用0.5
?如果不是,默认方法是什么?如果可以,该如何更改?
In a binary classification problem, is scikit's classifier.predict()
using 0.5
by default?If it doesn't, what's the default method? If it does, how do I change it?
在scikit中,某些分类器具有class_weight='auto'
选项,但并非全部都具有.使用class_weight='auto'
,.predict()
会使用实际人口比例作为阈值吗?
In scikit some classifiers have the class_weight='auto'
option, but not all do. With class_weight='auto'
, would .predict()
use the actual population proportion as a threshold?
在不支持class_weight
的分类器(如MultinomialNB
)中,如何做到这一点?除了使用predict_proba()
然后自己计算类之外.
What would be the way to do this in a classifier like MultinomialNB
that doesn't support class_weight
? Other than using predict_proba()
and then calculation the classes myself.
推荐答案
在概率分类器中,是的.正如其他人所解释的那样,从数学角度来看,这是唯一明智的阈值.
In probabilistic classifiers, yes. It's the only sensible threshold from a mathematical viewpoint, as others have explained.
您可以设置class_prior
,这是每个类别 y 的先验概率P( y ).这有效地改变了决策边界.例如
You can set the class_prior
, which is the prior probability P(y) per class y. That effectively shifts the decision boundary. E.g.
# minimal dataset
>>> X = [[1, 0], [1, 0], [0, 1]]
>>> y = [0, 0, 1]
# use empirical prior, learned from y
>>> MultinomialNB().fit(X,y).predict([1,1])
array([0])
# use custom prior to make 1 more likely
>>> MultinomialNB(class_prior=[.1, .9]).fit(X,y).predict([1,1])
array([1])
这篇关于scikit-learn .predict()默认阈值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!