问题描述
在应用最小最大缩放来标准化您的特征时,您是否在将整个数据集拆分为训练、验证和测试数据之前对整个数据集应用最小最大缩放?
While applying min max scaling to normalize your features, do you apply min max scaling on the entire dataset before splitting it into training, validation and test data?
还是先拆分,然后在每个集合上应用最小值最大值,使用该特定集合的最小值和最大值?
Or do you split first and then apply min max on each set, using the min and max values from that specific set?
最后,在对新输入进行预测时,该输入的特征是否应该在输入网络之前使用训练数据中的最小值、最大值进行归一化?
Lastly , when making a prediction on a new input, should the features of that input be normalized using the min, max values from the training data before being fed into the network?
推荐答案
拆分,然后缩放.想象一下:您不知道真实世界的数据是什么样的,因此您无法将训练数据扩展到它.您的测试数据是真实世界数据的替代品,因此您应该以同样的方式对待它.
Split it, then scale. Imagine it this way: you have no idea what real-world data looks like, so you couldn't scale the training data to it. Your test data is the surrogate for real-world data, so you should treat it the same way.
重申:拆分、缩放您的训练数据,然后将训练数据的缩放比例用于测试数据.
To reiterate: Split, scale your training data, then use the scaling from your training data on the testing data.
这篇关于您是否对训练和测试数据分别应用最小最大缩放?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!