本文介绍了在pySpark中使用paramGrid从CrossValidator提取结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我用pySpark训练了一个随机森林.我想在结果中每个网格点都有一个csv.我的代码是:
I train a Random Forest with pySpark. I want to have a csv with the results, per dot in the grid.My code is:
estimator = RandomForestRegressor()
evaluator = RegressionEvaluator()
paramGrid = ParamGridBuilder().addGrid(estimator.numTrees, [2,3])\
.addGrid(estimator.maxDepth, [2,3])\
.addGrid(estimator.impurity, ['variance'])\
.addGrid(estimator.featureSubsetStrategy, ['sqrt'])\
.build()
pipeline = Pipeline(stages=[estimator])
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=evaluator,
numFolds=3)
cvModel = crossval.fit(result)
所以我要一个csv:
numTrees | maxDepth | impurityMeasure
2 2 0.001
2 3 0.00023
等
做到这一点的最佳方法是什么?
What is the best way to do this?
推荐答案
您将不得不组合不同的数据位:
You'll have to combine different bits of data:
-
Estimator
ParamMaps
使用getEstimatorParamMaps
方法提取. - 可以使用
avgMetrics
参数检索的训练指标.
Estimator
ParamMaps
extracted usinggetEstimatorParamMaps
method.- Training metrics which can be retrieved using
avgMetrics
parameter.
首先获取在地图中声明的所有参数的名称和值:
First get names and values of all parameters declared in the map:
params = [{p.name: v for p, v in m.items()} for m in cvModel.getEstimatorParamMaps()]
Thane zip
具有指标并转换为数据框
Thane zip
with metrics and convert to a data frame
import pandas as pd
pd.DataFrame.from_dict([
{cvModel.getEvaluator().getMetricName(): metric, **ps}
for ps, metric in zip(params, cvModel.avgMetrics)
])
这篇关于在pySpark中使用paramGrid从CrossValidator提取结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!