问题描述
谁能告诉我为什么在拆分训练和测试集时我们将随机状态设置为零.
Can anyone tell me why we set random state to zero in splitting train and test set.
X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size=0.30, random_state=0)
我见过这样的情况,其中随机状态设置为1!
I have seen situations like this where random state is set to 1!
X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size=0.30, random_state=1)
在交叉验证中,这种随机状态还会产生什么后果?
What is the consequence of this random state in cross validation as well?
推荐答案
random_state是0还是1或任何其他整数都没有关系.重要的是,如果要在多次运行的代码中验证您的处理,则应将其设置为相同的值.顺便说一句,我已经看到 random_state = 42
在scikit的许多官方示例以及其他地方都使用过.
It doesn't matter if the random_state is 0 or 1 or any other integer. What matters is that it should be set the same value, if you want to validate your processing over multiple runs of the code. By the way I have seen random_state=42
used in many official examples of scikit as well as elsewhere also.
random_state
用于初始化内部随机数生成器,该生成器将根据您的情况决定将数据拆分为训练索引和测试索引.在文档中,指出:
random_state
as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case. In the documentation, it is stated that:
如果random_state是整数,则将其用作种子新的RandomState对象.
If random_state is an integer, then it is used to seed a new RandomState object.
如果random_state是RandomState对象,则将其通过.
If random_state is a RandomState object, then it is passed through.
这是在多次运行代码时检查和验证数据.将 random_state
设置为固定值将确保每次运行代码时都生成相同的随机数序列.除非过程中存在其他随机性,否则产生的结果将与往常一样.这有助于验证输出.
This is to check and validate the data when running the code multiple times. Setting random_state
a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.
这篇关于分裂数据集中的scikit学习随机状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!