在sklearn中使用RandomForestClassifier进行不平衡分类 | rn中使用RandomForestClassifier进行不平衡

本文介绍了在sklearn中使用RandomForestClassifier进行不平衡分类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据集，其中的类是不平衡的.类别为"1"或"0"，其中类别"1":"0"的比率为5:1.如何在带有随机森林的sklearn中计算每个类别的预测误差以及相应的重新平衡权重，类似于以下链接:http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#balance

I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the rebalance weights accordingly in sklearn with Random Forest, kind of like in the following link: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#balance

推荐答案

您可以将样本权重参数传递给Random Forest

You can pass sample weights argument to Random Forest fit method

sample_weight : array-like, shape = [n_samples] or None

在较早的版本中，存在一种preprocessing.balance_weights方法来生成给定样本的平衡权重，以使类变得均匀分布.它仍然存在，在内部但仍可用 preprocessing._weights >模块，但已过时，将在以后的版本中将其删除.不知道确切的原因.

In older version there were a preprocessing.balance_weights method to generate balance weights for given samples, such that classes become uniformly distributed. It is still there, in internal but still usable preprocessing._weights module, but is deprecated and will be removed in future versions. Don't know exact reasons for this.

更新

有些澄清，您似乎很困惑. sample_weight用法很简单，一旦您记住它的目的是在训练数据集中平衡目标类别.也就是说，如果将X作为观察值并将y作为类(标签)，则len(X) == len(y) == len(sample_wight)和sample witght 1-d数组的每个元素代表对应的(observation, label)对的权重.对于您的情况，如果1类表示为0类的5次，并且平衡了类的分布，则可以使用简单的

Some clarification, as you seems to be confused. sample_weight usage is straightforward, once you remember that its purpose is to balance target classes in training dataset. That is, if you have X as observations and y as classes (labels), then len(X) == len(y) == len(sample_wight), and each element of sample witght 1-d array represent weight for a corresponding (observation, label) pair. For your case, if 1 class is represented 5 times as 0 class is, and you balance classes distributions, you could use simple

sample_weight = np.array([5 if i == 0 else 1 for i in y])

将5的权重分配给所有0实例，将1的权重分配给所有1实例.有关更多balance_weights权重评估功能，请参见上面的链接.

assigning weight of 5 to all 0 instances and weight of 1 to all 1 instances. See link above for a bit more crafty balance_weights weights evaluation function.

这篇关于在sklearn中使用RandomForestClassifier进行不平衡分类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！