我正在使用Apache Spark的管道API进行参数验证。
我正在像这样构建TrainValidationSplitModel:
Pipeline pipeline = ...
ParamMap[] paramGrid = ...
TrainValidationSplit trainValidationSplit = new TrainValidationSplit().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator()).setEstimatorParamMaps(paramGrid).setTrainRatio(0.8);
TrainValidationSplitModel model = trainValidationSplit.fit(training);
我的问题是:如何提取和打印最佳训练模型的参数?
最佳答案
最后我做到了。
培训后,Spark会打印此指标。我的火花具有错误日志级别,因此我没有看到以下内容:
2015-10-21 12:57:33,828 [INFO org.apache.spark.ml.tuning.TrainValidationSplit]
Train validation split metrics: WrappedArray(0.7141940371838821, 0.7358721053749735)
2015-10-21 12:57:33,831 [INFO org.apache.spark.ml.tuning.TrainValidationSplit]
Best set of parameters:
{
hashingTF_79cf758f5ab1-numFeatures: 2000000,
nb_67d55ce4e1fc-smoothing: 1.0
}
2015-10-21 12:57:33,831 [INFO org.apache.spark.ml.tuning.TrainValidationSplit]
Best train validation split metric: 0.7358721053749735.
现在,我在log4j.properties文件中为TrainValidationSplit类添加了INFO级别:
log4j.logger.org.apache.spark.ml.tuning.TrainValidationSplit=INFO
log4j.additivity.org.apache.spark.ml.tuning.TrainValidationSplit=false
关于java - 如何在Apache Spark Pipeline中打印最佳模型参数?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32565594/