问题描述
我正在使用ML.NET通过回归模型预测一系列值.我只对预测的一列(分数列)感兴趣.但是,某些其他列的值不适用于预测类.我不能将它们保留为0,因为这会破坏预测,所以我猜他们也必须被预测.
I'm using ML.NET to predict a series of values using a regression model. I am only interested in one column being predicted (the score column). However, the values of some of the other columns are not available for the prediction class. I can't leave them at 0 as this would upset the prediction, so I guess they would have to also be predicted.
我在此处看到了一个有关预测多个值的类似问题. answer 建议创建两个模型,但是我可以看到,每个模型中指定的要素列均不包含另一种模式.因此,这意味着在进行预测时将不会使用这些列.我是错误的,还是每个模型的标签列也应该包含在另一个模型的功能列中?
I saw a similar question here on predicting multiple values. The answer suggests creating two models, but I can see that the feature columns specified in each model do not include the label column of the other model. So this implies that those columns would not be used when making the prediction. Am I wrong, or should the label column of each model also be included in the feature column of the other model?
下面是一些示例代码,试图在代码中进行解释:
Here's some example code to try and explain in code:
public class FooInput
{
public float Feature1 { get; set; }
public float Feature2 { get; set; }
public float Bar {get; set; }
public float Baz {get; set; }
}
public class FooPrediction : FooInput
{
public float BarPrediction { get; set; }
public float BazPrediction { get; set; }
}
public ITransformer Train(IEnumerable<FooInput> data)
{
var mlContext = new MLContext(0);
var trainTestData = mlContext.Data.TrainTestSplit(mlContext.Data.LoadFromEnumerable(data));
var pipelineBar = mlContext.Transforms.CopyColumns("Label", "Bar")
.Append(mlContext.Transforms.CopyColumns("Score", "BarPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Baz"))
.Append(mlContext.Regression.Trainers.FastTree());
var pipelineBaz = mlContext.Transforms.CopyColumns("Label", "Baz")
.Append(mlContext.Transforms.CopyColumns("Score", "BazPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Bar"))
.Append(mlContext.Regression.Trainers.FastTree());
return pipelineBar.Append(pipelineBaz).Fit(trainTestData.TestSet);
}
这实际上与前面提到的答案相同,但是添加了Baz
作为要预测Bar
的模型的特征,反之,添加了Bar
作为模型的特征其中Baz
是可以预测的.
This is effectively the same as the aforementioned answer, but with the addition of Baz
as a feature for the model where Bar
is to be predicted, and conversely the addition of Bar
as a feature for the model where Baz
is to be predicted.
这是正确的方法还是其他问题的答案达到了预期的结果,是因为每一列的预测都将利用加载的数据集中的另一预测列的值?
Is this the correct approach, or does the answer on the other question achieve the desired result, being that the prediction of each column will utilise the values of the other predicted column from the loaded dataset?
推荐答案
您可以使用的一种技术称为"Imputation",该技术将这些未知值替换为某些猜测"值.估算只是替换数据集缺失值的过程.
One technique you can use is called "Imputation", which replaces these unknown values with some "guessed" value. Imputation is simply the process of substituting the missing values of our dataset.
在ML.NET中,您要查找的是ReplaceMissingValues
转换.您可以在docs.microsoft上找到示例 .com.
In ML.NET, what you're looking for is the ReplaceMissingValues
transform. You can find samples on docs.microsoft.com.
您在上面讨论的技术也是一种估算的形式,您的未知数将由其他已知值中的预测值替换.这也可以.我想我会尝试两种形式,然后看看哪种形式最适合您的数据集.
The technique you are discussing above is also a form of imputation, where your unknowns are replaced by predicting the value from the other known values. This can work as well. I guess I would try both forms and see what works best for your dataset.
这篇关于在多个组合回归模型(ML.NET)中将列指定为要素和标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!