本文介绍了SMOTE值错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SMOTE函数对我稀疏的数据集进行过采样,该数据集包含约98%的0s& 2%1s.我使用了以下代码

I'm using SMOTE function for oversampling my sparse data set which contains around 98% 0s & 2% 1s.I used following code

from imblearn.over_sampling import SMOTE
import os
import pandas as pd
df_input= pd.read_csv('input_tr.csv',index_col=0) 
train_X=df_input.ix[:, df_input.columns != 'row_num']
df_output=pd.read_csv("output_tr.csv",index_col=0)
train_y=df_output
sm = SMOTE(random_state=12, ratio = 1.0)
train_X_sm,train_y_sm=sm.fit_sample(train_X,train_y)

我遇到以下错误

line 347, in kneighbors
(train_size, n_neighbors)
ValueError: Expected n_neighbors <= n_samples,  but n_samples = 4, n_neighbors = 6

您能帮我解决此错误吗?

Can you please help me to solve this error?

推荐答案

我遇到了类似的问题.

SMOTE基于 KNN算法,因此您 需要最少数量的样本 以创建该子集的新实例.

SMOTE is based in a KNN algorithm, so you need a minimal number of samples to create a new instance of this subset.

例如:

  • 如果您尝试预测的是1、2、3类整数,并假设您只有2个1类样本,如何获得k-3个邻居?将是不可能的.太不平衡了!

消息很清楚:

因此,您需要拥有比邻居更多或相等的 Samples 来创建新实例.

So, you need have more or equals SAMPLES than neighbors, to create new instances.

我正在查看您的数据集,并且您只有4个OUTPUT 1样本.因此,消息是说您只有4个,但是我需要6个邻居来创建它们的新实例.

I look yout dataset and you have just 4 samples of OUTPUT 1. So, the message is saying you have just 4 but I need 6 neighbors to create a new instance of them.

这篇关于SMOTE值错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 19:52