本文介绍了LinearRegression scala.MatchError:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在Spark 1.6.1和2.0中使用ParamGridBuilder时出现scala.MatchError
I am getting a scala.MatchError when using a ParamGridBuilder in Spark 1.6.1 and 2.0
val paramGrid = new ParamGridBuilder()
.addGrid(lr.regParam, Array(0.1, 0.01))
.addGrid(lr.fitIntercept)
.addGrid(lr.elasticNetParam, Array(0.0, 0.5, 1.0))
.build()
错误是
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 57.0 failed 1 times, most recent failure: Lost task 0.0 in stage 57.0 (TID 257, localhost):
scala.MatchError: [280000,1.0,[2400.0,9373.0,3.0,1.0,1.0,0.0,0.0,0.0]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
问题是在这种情况下我应该如何使用ParamGridBuilder
The question is how I should use ParamGridBuilder in this case
推荐答案
问题在于输入模式不是ParamGridBuilder
. 价格"列作为整数加载,而LinearRegression
期望为双精度.您可以通过将列显式转换为所需类型来解决此问题:
Problem here is input schema not ParamGridBuilder
. Price column is loaded as an integer while LinearRegression
is expecting a double. You can fix it by explicitly casting column to required type:
val houses = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(...)
.withColumn("price", $"price".cast("double"))
这篇关于LinearRegression scala.MatchError:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!