python - 使用SMOTE库在Python中平衡数据

我想平衡一组训练数据，这些数据具有以下特征，并在X_train和y_train中分开。我的课程所占百分比大致如下：

class A: 54%
class B: 45%
class C: 1%

因此，我想按以下方式重新采样数据：

class A: 49%
class B: 41%
class C: 10%

我要使用的库是：

https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.RandomOverSampler.html

并使用Smote作为平衡算法。我的问题是我不知道如何使用此库。我知道Smote算法，但是在使用此库时遇到了一些困难。有什么帮助吗？

谢谢

最佳答案

您以前使用过sklearn吗？这与它的工作原理非常相似。有效使用smote本身就像在数据上运行模型以生成更多虚拟数据来平衡它。

imblearn page中的此示例很好地描述了它：

>>> from collections import Counter
>>> from sklearn.datasets import make_classification
>>> from imblearn.over_sampling import SMOTE # doctest: +NORMALIZE_WHITESPACE
>>> X, y = make_classification(n_classes=2, class_sep=2,
... weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
... n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
>>> print('Original dataset shape %s' % Counter(y))
Original dataset shape Counter({1: 900, 0: 100})
>>> sm = SMOTE(random_state=42)
>>> X_res, y_res = sm.fit_resample(X, y)
>>> print('Resampled dataset shape %s' % Counter(y_res))
Resampled dataset shape Counter({0: 900, 1: 900})

具体来说，当您拥有训练数据X和目标y时，可以根据需要实例化具有随机状态的SMOTE（）实例。然后，将其拟合到数据X_res,y_res = sm.fit_resample(X,y)中。 fit_resample()一次完成两项工作，使SMOTE算法适合您的数据集，然后使用新的过采样数据集转换（重新采样）您的数据集。

关于python - 使用SMOTE库在Python中平衡数据，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/60442051/