问题描述
我想通过使用JAVA的Spark访问HBase。除了之外,我还没有发现任何示例。在答案中写道,
我从:
import org.apache.hadoop.hbase.client。{HBaseAdmin,Result}
import org.apache.hadoop.hbase。{HBaseConfiguration,HTableDescriptor}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.spark._
object HBaseRead {
def main(args:Array [String]){
val sparkConf = new SparkConf() .setAppName(HBaseRead)。setMaster(local [2])
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
val tableName =table1
System.setProperty(user.name,hdfs)
System.setProperty(HADOOP_USER_NAME, hdfs)
conf.set(hbase.master,localhost:60000)
conf.setInt(timeout,120000)
conf.set(hbase.zookeeper .quorum,localhost)
conf.set(zookeeper.znode.parent,/ hbase-unsecure)
conf.set(TableInputFormat.INPUT_TABLE,tableName)
val admin = new HBaseAdmin(conf)
if(!admin.isTableAvailable(tableName)){
val tableDesc = new HTableDescriptor(tableName)
admin.createTable(tableDesc)
$ val hBaseRDD = sc.newAPIHadoopRDD(conf,classOf [TableInputFormat],classOf [ImmutableBytesWritable],classOf [Result])
println(找到的记录数量:+ hBaseRDD .count())
sc.stop()
}
}
任何人都可以给我一些提示如何找到正确的依赖关系,对象和东西?
HBaseConfiguration 好像是 hbase-client
,但我实际上坚持 TableInputFormat.INPUT_TABLE
。难道这不在相同的依赖吗?
有没有更好的方法来使用spark来访问hbase?
是的。有。使用来自Cloudera。
<依赖关系>
< groupId> org.apache.hbase< / groupId>
< artifactId> hbase-spark< / artifactId>
< version> 1.2.0-cdh5.7.0< / version>
< /依赖关系>
然后使用HBase扫描从您的HBase表中读取数据(如果您知道密钥你可以检索这些行)。
Configuration conf = HBaseConfiguration.create();
conf.addResource(新路径(/ etc / hbase / conf / core-site.xml));
conf.addResource(新路径(/ etc / hbase / conf / hbase-site.xml));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc,conf);
扫描扫描=新扫描();
scan.setCaching(100);
JavaRDD< Tuple2< byte [],List< Tuple3< byte [],byte [],byte []>>>> hbaseRdd = hbaseContext.hbaseRDD(tableName,scan);
System.out.println(找到的记录数:+ hBaseRDD.count())
I want to access HBase via Spark using JAVA. I have not found any examples for this besides this one. In the answer is written,
I copied this code from How to read from hbase using spark :
import org.apache.hadoop.hbase.client.{HBaseAdmin, Result}
import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor }
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.spark._
object HBaseRead {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HBaseRead").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
val tableName = "table1"
System.setProperty("user.name", "hdfs")
System.setProperty("HADOOP_USER_NAME", "hdfs")
conf.set("hbase.master", "localhost:60000")
conf.setInt("timeout", 120000)
conf.set("hbase.zookeeper.quorum", "localhost")
conf.set("zookeeper.znode.parent", "/hbase-unsecure")
conf.set(TableInputFormat.INPUT_TABLE, tableName)
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(tableName)) {
val tableDesc = new HTableDescriptor(tableName)
admin.createTable(tableDesc)
}
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
println("Number of Records found : " + hBaseRDD.count())
sc.stop()
}
}
Can anyone give me some hints how to find the correct dependencies, objects and stuff?
It seems like HBaseConfiguration
is in hbase-client
, but I actually stuck on TableInputFormat.INPUT_TABLE
. Shouldn´t this be in the same dependency?
Is there a better way to access hbase with spark?
Yes. There is. Use SparkOnHbase from Cloudera.
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-spark</artifactId>
<version>1.2.0-cdh5.7.0</version>
</dependency>
And the use HBase scan to read data from you HBase table (or Bulk Get if you know the keys of the rows you want to retrieve).
Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
Scan scan = new Scan();
scan.setCaching(100);
JavaRDD<Tuple2<byte[], List<Tuple3<byte[], byte[], byte[]>>>> hbaseRdd = hbaseContext.hbaseRDD(tableName, scan);
System.out.println("Number of Records found : " + hBaseRDD.count())
这篇关于通过使用Spark with JAVA从HBase读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!