


I know the general rule that we should test a trained classifier only on the testing set.

但是现在出现了问题: 当我已经受过训练时并准备好经过测试的分类器,是否可以将其应用于训练和测试集基础的同一数据集? 还是我必须将其应用于与训练不同的新预测集+测试集?

But now comes the question: When I have an already trained and tested classifier ready, can I apply it to the same dataset that was the base of the training and testing set? Or do I have to apply it to a new predicting set that is different from the training+testing set?

如果我预测时间序列 的标签列怎么办(稍后编辑:我并不是要在此处创建经典的时间序列分析,但是我只能从典型的数据库中选择广泛的列,就可以将每周,每月或随机存储的数据转换成单独的功能列,每个功能列分别按每周,每月/每年...) 必须将训练+测试设置的所有特征(不仅是时间序列标签列的过去列,而且还包括所有其他正常特征)移回数据具有时间点没有知识

And what if I predict a label column of a time series (edited later: I do not mean to create a classical time series analysis here, but just a broad selection of columns from a typical database, weekly, monthly or randomly stored data that I convert into separate feature columns, each for one week / month / year ...), do I have to shift all of the features (not just the past columns of the time series label column, but also all other normal features) of the training+testing set back to a point in time where the data has no "knowledge" interception with the predicting set?


I would then train and test the classifier on features shifted to the past by n months, scoring against a label column that is unshifted and most recent, and then predicting from most recent, unshifted features. Shifted and unshifted features have the same number of columns, I align shifted and unshifted features by assigning the column names of the shifted features to the unshifted features.




In data mining tools (for multivariate statistics and machine learning), the dependent variable is assigned a role as target variable (or in some tools as label attribute), while an independent variable may be assigned a role as regular variable.[8] Known values for the target variable are provided for the training data set and test data set, but should be predicted for other data.


p.s.2: In this basic tutorial we can see that the predicting set is made different: https://scikit-learn.org/stable/tutorial/basic/tutorial.html

我们使用[:-1] Python语法选择训练集,该训练集产生一个包含以下内容的新数组全部>但是,digits.data中的最后一项:[…]现在您可以预测新值了。在这种情况下,您将预测使用digits.data [-1:]中的最后一张图像。通过预测,您将从训练集中确定与最后一张图像最匹配的图像。

We select the training set with the [:-1] Python syntax, which produces a new array that contains all > but the last item from digits.data: […] Now you can predict new values. In this case, you’ll predict using the last image from digits.data [-1:]. By predicting, you’ll determine the image from the training set that best matches the last image.



Answering myself after half a year here. The first answer was a slight misunderstanding about the term "time series" which I had caused with an unclear question (edited).


The question above When I have an already trained and tested classifier ready, can I apply it to the same dataset that was the base of the training and testing set? has the simple answer: no.

是否必须转移所有功能 上面的问题,答案很简单:是的。

The question above Do I have to shift all of the features has the simple answer: yes.


In short, if I predict a month's class column: I have to shift all of the non-class columns also back in time in addition to the previous class months I converted to features, all data must have been known before the month in that the class is predicted.


09-05 05:45