嗨,我是spark mllib的新手。我已经有一个r模型。我正在尝试与spark mllib相同的模型。这是R模型代码。
R代码。
delhi <- read.delim("UItrain.txt", na.strings = "")
delhi$lnprice <- log(delhi$price)
heddel <- lm(lnprice ~ bedrooms+ bathrooms+ area)
deltest <- read.delim("UItest.txt", na.strings = "")
predict (heddel, deltest)
我正在使用Java在Spark mllib中尝试相同的R代码。
SparkConf conf = new SparkConf().setAppName("Linear Regression Example");
JavaSparkContext sc = new JavaSparkContext(conf);
String path = "UItrain.txt";
JavaRDD<String> data = sc.textFile(path);
JavaRDD<LabeledPoint> parsedData = data.map(
new Function<String, LabeledPoint>() {
public LabeledPoint call(String line) {
String[] parts = line.split("\t");
String[] features = parts[1].split("\t");
double[] v = new double[features.length];
for (int i = 0; i < features.length - 1; i++)
v[i] = Double.parseDouble(features[i]);
return new LabeledPoint(Double.parseDouble(parts[0]), Vectors.dense(v));
}
}
);
parsedData.cache();
// Building the model
String input = "UItrain.txt";
int data2 = "UItest.txt";
int numIterations = 100;
final LinearRegressionModel model =
LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData), data2);
// Evaluate model on training examples and compute training error
JavaRDD<Tuple2<Double, Double>> valuesAndPreds = parsedData.map(
new Function<LabeledPoint, Tuple2<Double, Double>>() {
public Tuple2<Double, Double> call(LabeledPoint point) {
double prediction = model.predict(point.features());
return new Tuple2<Double, Double>(prediction, point.label());
}
}
);
double MSE = new JavaDoubleRDD(valuesAndPreds.map(
new Function<Tuple2<Double, Double>, Object>() {
public Object call(Tuple2<Double, Double> pair) {
return Math.pow(pair._1() - pair._2(), 2.0);
}
}
).rdd()).mean();
System.out.println("training Mean Squared Error = " + MSE);
建立模型时出现错误。任何帮助将不胜感激。
最佳答案
我认为您的错误出在data2
这里:
final LinearRegressionModelmodel=LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData), data2)
回归期望迭代次数,而是接收文本,
int data2 = "UItest.txt";
如果这不是错误,请编辑并打印错误。
关于r - 如何预测mllib中的值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/32865388/