问题描述
我正在使用SMOTE函数对我稀疏的数据集进行过采样,该数据集包含约98%的0s& 2%1s.我使用了以下代码
I'm using SMOTE function for oversampling my sparse data set which contains around 98% 0s & 2% 1s.I used following code
from imblearn.over_sampling import SMOTE
import os
import pandas as pd
df_input= pd.read_csv('input_tr.csv',index_col=0)
train_X=df_input.ix[:, df_input.columns != 'row_num']
df_output=pd.read_csv("output_tr.csv",index_col=0)
train_y=df_output
sm = SMOTE(random_state=12, ratio = 1.0)
train_X_sm,train_y_sm=sm.fit_sample(train_X,train_y)
我遇到以下错误
line 347, in kneighbors
(train_size, n_neighbors)
ValueError: Expected n_neighbors <= n_samples, but n_samples = 4, n_neighbors = 6
您能帮我解决此错误吗?
Can you please help me to solve this error?
推荐答案
我遇到了类似的问题.
SMOTE基于 KNN算法,因此您 需要最少数量的样本 以创建该子集的新实例.
SMOTE is based in a KNN algorithm, so you need a minimal number of samples to create a new instance of this subset.
例如:
- 如果您尝试预测的是1、2、3类整数,并假设您只有2个1类样本,如何获得k-3个邻居?将是不可能的.太不平衡了!
消息很清楚:
因此,您需要拥有比邻居更多或相等的 Samples 来创建新实例.
So, you need have more or equals SAMPLES than neighbors, to create new instances.
我正在查看您的数据集,并且您只有4个OUTPUT 1样本.因此,消息是说您只有4个,但是我需要6个邻居来创建它们的新实例.
I look yout dataset and you have just 4 samples of OUTPUT 1. So, the message is saying you have just 4 but I need 6 neighbors to create a new instance of them.
这篇关于SMOTE值错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!