scikit-learn .predict() 默认阈值

本文介绍了scikit-learn .predict() 默认阈值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究具有不平衡类(5% 1)的分类问题.我想预测类别，而不是概率.

I'm working on a classification problem with unbalanced classes (5% 1's). I want to predict the class, not the probability.

在二元分类问题中，scikit的classifier.predict()是否默认使用0.5?如果没有，默认方法是什么?如果是，我该如何更改?

In a binary classification problem, is scikit's classifier.predict() using 0.5 by default?If it doesn't, what's the default method? If it does, how do I change it?

在 scikit 中，一些分类器具有 class_weight='auto' 选项，但并非所有分类器都有.使用 class_weight='auto'，.predict() 会使用实际人口比例作为阈值吗?

In scikit some classifiers have the class_weight='auto' option, but not all do. With class_weight='auto', would .predict() use the actual population proportion as a threshold?

在像 MultinomialNB 这样不支持 class_weight 的分类器中执行此操作的方法是什么?除了使用 predict_proba() 然后自己计算类.

What would be the way to do this in a classifier like MultinomialNB that doesn't support class_weight? Other than using predict_proba() and then calculation the classes myself.

推荐答案

在概率分类器中，是的.正如其他人所解释的那样，从数学的角度来看，这是唯一合理的阈值.

In probabilistic classifiers, yes. It's the only sensible threshold from a mathematical viewpoint, as others have explained.

在像 MultinomialNB 这样不支持 class_weight 的分类器中执行此操作的方法是什么?

您可以设置class_prior，即每个类y的先验概率P(y).这有效地改变了决策边界.例如

You can set the class_prior, which is the prior probability P(y) per class y. That effectively shifts the decision boundary. E.g.

# minimal dataset
>>> X = [[1, 0], [1, 0], [0, 1]]
>>> y = [0, 0, 1]
# use empirical prior, learned from y
>>> MultinomialNB().fit(X,y).predict([1,1])
array([0])
# use custom prior to make 1 more likely
>>> MultinomialNB(class_prior=[.1, .9]).fit(X,y).predict([1,1])
array([1])

这篇关于scikit-learn .predict() 默认阈值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！