问题描述
我有一个数据集,其中的类是不平衡的.类别为"1"或"0",其中类别"1":"0"的比率为5:1.如何在带有随机森林的sklearn中计算每个类别的预测误差以及相应的重新平衡权重,类似于以下链接:http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#balance
I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the rebalance weights accordingly in sklearn with Random Forest, kind of like in the following link: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#balance
推荐答案
您可以将样本权重参数传递给Random Forest
You can pass sample weights argument to Random Forest fit method
sample_weight : array-like, shape = [n_samples] or None
在较早的版本中,存在一种preprocessing.balance_weights
方法来生成给定样本的平衡权重,以使类变得均匀分布.它仍然存在,在内部但仍可用 preprocessing._weights >模块,但已过时,将在以后的版本中将其删除.不知道确切的原因.
In older version there were a preprocessing.balance_weights
method to generate balance weights for given samples, such that classes become uniformly distributed. It is still there, in internal but still usable preprocessing._weights module, but is deprecated and will be removed in future versions. Don't know exact reasons for this.
更新
有些澄清,您似乎很困惑. sample_weight
用法很简单,一旦您记住它的目的是在训练数据集中平衡目标类别.也就是说,如果将X
作为观察值并将y
作为类(标签),则len(X) == len(y) == len(sample_wight)
和sample witght
1-d数组的每个元素代表对应的(observation, label)
对的权重.对于您的情况,如果1
类表示为0
类的5次,并且平衡了类的分布,则可以使用简单的
Some clarification, as you seems to be confused. sample_weight
usage is straightforward, once you remember that its purpose is to balance target classes in training dataset. That is, if you have X
as observations and y
as classes (labels), then len(X) == len(y) == len(sample_wight)
, and each element of sample witght
1-d array represent weight for a corresponding (observation, label)
pair. For your case, if 1
class is represented 5 times as 0
class is, and you balance classes distributions, you could use simple
sample_weight = np.array([5 if i == 0 else 1 for i in y])
将5
的权重分配给所有0
实例,将1
的权重分配给所有1
实例.有关更多balance_weights
权重评估功能,请参见上面的链接.
assigning weight of 5
to all 0
instances and weight of 1
to all 1
instances. See link above for a bit more crafty balance_weights
weights evaluation function.
这篇关于在sklearn中使用RandomForestClassifier进行不平衡分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!