问题描述
我正在使用 Pandas 从 CSV 文件导入大量数据,一旦读取,我将其格式化为仅包含数字数据.然后返回列表中的列表.然后每个列表包含大约 140k 位的数据.numericData[][]
.
I'm using pandas to import a lot of data from a CSV file, and once read I format it to contain only numerical data. This then returns a list within a list. Each list then contains around 140k bits of data. numericalData[][]
.
从这个列表中,我希望创建 Testing
和 Training Data
.对于我的测试数据,我希望拥有 30% 的读取数据 numericData
,因此我使用以下代码;
From this list, I wish to create Testing
and Training Data
. For my testing data, I want to have 30% of my read data numericalData
, so I use this following bit of code;
testingAmount = len(numericalData0[0]) * trainingDataPercentage / 100
效果很好.然后,我使用 numpy 从导入的 numericData
;
Works a treat. Then, I use numpy to select that amount of data from each column of my imported numericalData
;
testingData.append(np.random.choice(numericalData[x], testingAmount) )
然后返回一个包含 38 列的样本(循环运行),其中每列有大约 49k 个从我导入的 numericData
中随机选择的数据元素.
This then returns a sample with 38 columns (running in a loop), where each column has around 49k elements of data randomly selected from my imported numericalData
.
问题是,我的 trainingData
需要保存其他 70% 的数据,但我不确定如何做到这一点.我尝试比较 testingData
中的每个元素,如果两个元素不相等,则将其添加到我的 trainingData
.这导致了错误并且不起作用.接下来,我尝试从导入的数据中删除选定的 testingData
,然后将该新列保存到我的 trainingData
中,唉,这不起作用.
The issue is, my trainingData
needs to hold the other 70% of the data, but I'm unsure on how to do this. I've tried to compare each element in my testingData
, and if both elements aren't equal, then add it to my trainingData
. This resulted in an error and didn't work. Next, I tried to delete the selected testingData
from my imported data, and then save that new column to my trainingData
, alas, that didn't work eiher.
过去一周我只使用了 python,所以我对现在尝试什么有点迷茫.
I've only been working with python for the past week so I'm a bit lost on what to try now.
推荐答案
之后可以使用 random.shuffle
和拆分列表.以玩具为例:
You can use random.shuffle
and split list after that. For toy example:
import random
data = range(1, 11)
random.shuffle(data)
training = data[:5]
testing = data[5:]
要获取更多信息,请阅读文档.
To get more information, read the docs.
这篇关于从一个列表中随机创建两个列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!