问题描述
我正在使用信用卡数据通过SMOTE进行过采样.我正在使用在geeksforgeeks.org
先谢谢了.
从您的代码中可以看到,您的 X_train_res
和其他代码都是Python Numpy数组.您可以执行以下操作:
将numpy导入为np将熊猫作为pd导入y_train_res = y_train_res.reshape(-1,1)#将y_train重塑为(398038,1)data_res = np.concatenate((X_train_res,y_train_res),轴= 1)data.savetxt('sample_smote.csv',data_res,delimiter =,")
无法运行并检查它,但是如果您遇到任何问题,请告诉我.
注意:您将需要做更多的事情才能为其添加列标签.一旦您完成此过程并需要帮助,请告诉我.
I am using Credit card data for oversampling using SMOTE. I am using the code written in geeksforgeeks.org (Link)
After running the following code, it states something like that:
print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0)))
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape))
print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0)))
Output:
Before OverSampling, counts of label '1': 345
Before OverSampling, counts of label '0': 199019
After OverSampling, the shape of train_X: (398038, 29)
After OverSampling, the shape of train_y: (398038,)
After OverSampling, counts of label '1': 199019
After OverSampling, counts of label '0': 199019
As I am totally new in this area. I cant understand how to show these data in CSV format. I will be very glad if anyone help me regarding this issue.
Or if there is any reference from where I can make synthetic data from a dataset using SMOTE and save the updated dataset in a CSV file, please mention it.
Something like following image:
Thanks in advance.
From what I can see from you code, your X_train_res
and others are Python Numpy arrays. You can do something like this:
import numpy as np
import pandas as pd
y_train_res = y_train_res.reshape(-1, 1) # reshaping y_train to (398038,1)
data_res = np.concatenate((X_train_res, y_train_res), axis = 1)
data.savetxt('sample_smote.csv', data_res, delimiter=",")
Cannot run and check it, but let me know if you face any issues.
Note: You will have to do something more to add column labels to it. Let me know once you are through this and need help for that.
这篇关于如何使用SMOTE将合成数据集保存在CSV文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!