拆分XDF文件/数据集以进行培训和测试 | 数据集以进行培训和测试

本文介绍了拆分XDF文件/数据集以进行培训和测试的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否可以在Microsoft RevoScaleR上下文中将.xdf文件拆分为75％的培训和25％的测试集?我知道有一个名为rxSplit()的函数，但是文档似乎不适用于这种情况.在线上的大多数示例都为数据集分配一列随机数，然后使用该列对其进行拆分.

Is it possible to split a .xdf file in (the Microsoft RevoScaleR context) into a let's say 75% training and 25% test set? I know there is a function called rxSplit(), but, the documentation doesn't seem to apply to this case. Most of the examples online assign a column of random numbers to the dataset, and split it using that column.

谢谢.托马斯

推荐答案

您当然可以使用rxSplit.创建一个定义您的训练和测试样本的变量，然后对其进行拆分.

You can certainly use rxSplit for this. Create a variable that defines your training and test samples, and then split on it.

例如，使用mtcars玩具数据集:

For example, using the mtcars toy dataset:

xdf <- rxDataStep(mtcars, "mtcars.xdf")
xdfList <- rxSplit(xdf, splitByFactor="test",
    transforms=list(test=factor(runif(.rxNumRows) < 0.25, levels=c("FALSE", "TRUE"))))

xdfList现在是一个包含2个xdf数据源的列表:一个包含(大约)75％的数据，另一个包含25％的数据.

xdfList is now a list containing 2 xdf data sources: one with (approximately) 75% of the data, and the other with 25%.

这篇关于拆分XDF文件/数据集以进行培训和测试的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！