本文介绍了字段“特征"不存在.火花ML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Zeppelin 在 Spark ML 中构建模型.我是这个领域的新手,需要一些帮助.我想我需要为列设置正确的数据类型并将第一列设置为标签.任何帮助将不胜感激,谢谢

I am trying to build a model in Spark ML with Zeppelin.I am new to this area and would like some help. I think i need to set the correct datatypes to the column and set the first column as the label. Any help would be greatly appreciated, thank you

val training = sc.textFile("hdfs:///ford/fordTrain.csv")
val header = training.first
val inferSchema = true
val df = training.toDF

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

 val lrModel = lr.fit(df)

// Print the coefficients and intercept for multinomial logistic regression
println(s"Coefficients: \n${lrModel.coefficientMatrix}")
println(s"Intercepts: ${lrModel.interceptVector}")

我正在使用的 csv 文件片段是:

A snippet of the csv file i am using is:

IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2
0,34.7406,9.84593,1400,42.8571,0.290601,572,104.895,0,0,0,

推荐答案

正如您所提到的,您缺少 features 列.它是一个包含所有预测变量的向量.您必须使用 VectorAssembler 创建它.

As you have mentioned, you are missing the features column. It is a vector containing all predictor variables. You have to create it using VectorAssembler.

IsAlert 是标签,所有其他变量 (p1,p2,...) 都是预测变量,您可以创建 features 列(实际上您可以将其命名为任何名称)你想要而不是 features) 通过:

IsAlert is the label and all others variables (p1,p2,...) are predictor variables, you can create features column (actually you can name it anything you want instead of features) by:

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors

//creating features column
val assembler = new VectorAssembler()
  .setInputCols(Array("P1","P2","P3","P4","P5","P6","P7","P8","E1","E2"))
  .setOutputCol("features")


val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)
  .setFeaturesCol("features")   // setting features column
  .setLabelCol("IsAlert")       // setting label column

//creating pipeline
val pipeline = new Pipeline().setStages(Array(assembler,lr))

//fitting the model
val lrModel = pipeline.fit(df)

参考:https://spark.apache.org/docs/latest/ml-features.html#vectorassembler.

这篇关于字段“特征"不存在.火花ML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 13:15