saveAsNewAPIHadoopFile() 在用作输出格式时出错

本文介绍了saveAsNewAPIHadoopFile() 在用作输出格式时出错的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在 Spark 中运行 teragen 程序的修改版本，用 Scala 编写.我正在尝试使用函数 saveAsNewAPIHadoopFile() 保存输出文件.相关代码如下:

I am running a modified version of the teragen program in Spark, written in Scala. I am trying to save the output file using the function saveAsNewAPIHadoopFile(). The relevant code is given below:

dataset.map(row => (NullWritable.get(), new BytesWritable(row))).saveAsNewAPIHadoopFile(output)

代码编译成功.但是，在运行它时，我收到以下错误:

The code is compiling successfully. However, when running it, I am getting the following error:

Exception in thread "main" java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
    at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1794)
    at org.apache.hadoop.mapreduce.Job.setOutputFormatClass(Job.java:823)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:830)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:811)
    at GenSort$.main(GenSort.scala:52)
    at GenSort.main(GenSort.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

有没有办法让它与 saveAsNewAPIHadoopFile() 一起工作?我很乐意提供任何帮助.

Is there a way to make it work with saveAsNewAPIHadoopFile()? I would be glad for any help.

推荐答案

saveAsNewAPIHadoopFile expect key, value, outformat classes.

The saveAsNewAPIHadoopFile expect key, value, outformat classes.

方法签名是:

saveAsNewAPIHadoopFile(path: String,suffix: String, 
 keyClass: Class[_],
 valueClass: Class[_], 
 outputFormatClass: Class[_ <: org.apache.hadoop.mapreduce.OutputFormat[_, _]])

实施应该是:

dataset.map(row => (NullWritable.get(), new BytesWritable(row))).saveAsNewAPIHadoopFile("hdfs:\\.....","<suffix>",classOf[NullWritable],classOf[BytesWritable],classOf[org.apache.hadoop.mapreduce.lib.output.TextOutputFormat[NullWritable, BytesWritable]]))

或

dataset.map(row => (NullWritable.get(), new BytesWritable(row))).
saveAsNewAPIHadoopFile("hdfs:\\.....","<suffix>",
new NullWritable().getClass,new BytesWritable.getClass,
new org.apache.hadoop.mapreduce.lib.output.TextOutputFormat[NullWritable, BytesWritable].getClass))

这篇关于saveAsNewAPIHadoopFile() 在用作输出格式时出错的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！