本文介绍了Matlab:如何在保留标签信息的同时将数据矩阵拆分为列向量的两个随机子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据矩阵X(60x208)和一个标签矩阵Y(1x208).我想将数据矩阵X分为列向量的两个随机子集:训练(将占数据的70%)和测试(将占数据的30%),但是我仍然需要能够确定哪个Y的标签对应于每个列向量.我找不到任何执行此操作的功能,有任何想法吗?

以为我应该补充一下,Y中只有两个标签:1和2(不确定是否会有所不同)

解决方案

这很容易做到.使用 randperm 生成从1到多达多个点的索引的随机排列如果您有...在您的情况下就是208.

生成此序列后,只需使用此序列和子集到XY中,即可提取训练和测试数据及标签.因此,请执行以下操作:

num_points = size(X,2);
split_point = round(num_points*0.7);
seq = randperm(num_points);
X_train = X(:,seq(1:split_point));
Y_train = Y(seq(1:split_point));
X_test = X(:,seq(split_point+1:end));
Y_test = Y(seq(split_point+1:end));

split_point确定我们需要在训练集中放置多少个点,并且如果此计算产生任何小数点,我们将需要四舍五入.我也没有在其中进行硬编码208,因为您的数据集可能会增长,因此可以与您选择的任何大小的数据集一起使用. X_trainY_train将包含您的训练集的数据和标签,而X_testY_test将包含您的测试集的数据和标签.

因此,X_train的第一列是训练集第一个元素的数据点,而Y_train的第一个元素用作该特定点的标签...等等.来!

I have a data matrix X (60x208) and a matrix of labels Y (1x208). I want to split my data matrix X into two random subsets of column vectors: training (which will be 70% of the data) and testing (which will be 30% of the data), but I need to still be able to identify which label from Y corresponds to each column vector. I couldn't find any function to do this, any ideas?

EDIT: Thought I should add, there are only two labels in Y: 1 and 2 (not sure if this makes a difference)

解决方案

That's pretty easy to do. Use randperm to generate a random permutation of indices from 1 up to as many points as you have... which is 208 in your case.

Once you generate this sequence, simply use this and subset into your X and Y to extract the training and test data and labels. As such, do something like this:

num_points = size(X,2);
split_point = round(num_points*0.7);
seq = randperm(num_points);
X_train = X(:,seq(1:split_point));
Y_train = Y(seq(1:split_point));
X_test = X(:,seq(split_point+1:end));
Y_test = Y(seq(split_point+1:end));

The split_point determines how many points we need to place into our training set, and we will need to round it in case this calculation yields any decimal points. I also didn't hard code 208 in there because your data set might grow and so this will work with any size data set you choose. X_train and Y_train will contain your data and labels for your training set while X_test and Y_test will contain your data and labels for your test set.

As such, the first column of X_train is your data point for the first element of your training set, with the first element of Y_train serving as the label for that particular point... and so on and so forth!

这篇关于Matlab:如何在保留标签信息的同时将数据矩阵拆分为列向量的两个随机子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 16:23
查看更多