我正在尝试使用Spark从HBase读取数据。我正在使用的版本是
Spark 1.3.1和Hbase 1.1.1我收到以下错误

ERROR TableInputFormat: java.lang.NullPointerException
    at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:417)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159)
    at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
    at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
    at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
    at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
    at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:206)
    at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:204)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.dependencies(RDD.scala:204)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scal

代码如下
 public static void main( String[] args )
{
    String TABLE_NAME = "Hello";
    HTable table=null;
    SparkConf sparkConf = new SparkConf();
    sparkConf.setAppName("Data Reader").setMaster("local[1]");
    sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)");

    JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);

    Configuration hbConf = HBaseConfiguration.create();
    hbConf.set("zookeeper.znode.parent", "/hbase-unsecure");
    try {
         table = new HTable(hbConf, Bytes.toBytes(TABLE_NAME));

    } catch (IOException e) {

        e.printStackTrace();
    }

    JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sparkContext
            .newAPIHadoopRDD(
                    hbConf,
                    TableInputFormat.class,
                    org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
                    org.apache.hadoop.hbase.client.Result.class);
    hBaseRDD.coalesce(1, true);
    System.out.println("Count "+hBaseRDD.count());
    //.saveAsTextFile("hBaseRDD");
    try {
        table.close();
        sparkContext.close();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

我无法解决问题。我为此使用Hortonworks Sandbox。

最佳答案

你写了:

try {
     table = new HTable(hbConf, Bytes.toBytes(TABLE_NAME));

} catch (IOException e) {

     e.printStackTrace();
}

如果您使用的是1.1.1 API:

devapidocs中,我只能看到两个构造函数:



第一个构造函数的params构造函数为:BufferedMutatorParams(TableName tableName)
并且TableName没有构造函数。

因此,您必须像这样初始化HTable:
table = new HTable(hbConf, new bufferedMutatorParams(TableName.valueOf(TABLE_NAME))

如果您正在使用0.94 API:

HTBale的构造函数是:



所以,最后看看,您只需要传递字符串名称,而不是字节[]
table = new HTable(hbConf, TABLE_NAME);

没关系。

关于hadoop - 错误TableInputFormat:org.Apache.Hadoop.hbase.TableName.valueOf处的Java.lang.NullPointerException,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/33514083/

10-12 22:55