问题描述
我正在尝试使用Spark Scala API读取HBase表.
I am trying to read a HBase table using Spark Scala API.
示例代码:
conf.set("hbase.master", "localhost:60000")
conf.set("hbase.zookeeper.quorum", "localhost")
conf.set(TableInputFormat.INPUT_TABLE, tableName)
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
println("Number of Records found : " + hBaseRDD.count())
如果我使用newAPIHadoopRDD
,如何添加where
子句?
How to add where
clause if i use newAPIHadoopRDD
?
还是我们需要使用任何Spark Hbase Connector
来实现这一目标?
Or we need to use any Spark Hbase Connector
to achieve this?
我看到了下面的Spark Hbase连接器,但是没有看到带where子句的示例代码.
I saw the below Spark Hbase connector, but i don't see any example code with where clause.
https://github.com/nerdammer/spark-hbase-connector
推荐答案
您可以使用HortonWorks的SHC连接器来实现.
You can use SHC connector from HortonWorks to achieve this.
https://github.com/hortonworks-spark/shc
这是Spark 2的代码示例.
Here is a code example with Spark 2.
val catalog =
s"""{
|"table":{"namespace":"default", "name":"my_table"},
|"rowkey":"id",
|"columns":{
|"id":{"cf":"rowkey", "col":"id", "type":"string"},
|"name":{"cf":"info", "col":"name", "type":"string"},
|"age":{"cf":"info", "col":"age", "type":"string"}
|}
|}""".stripMargin
val spark = SparkSession
.builder()
.appName("hbase spark")
.getOrCreate()
val df = spark
.read
.options(
Map(
HBaseTableCatalog.tableCatalog -> catalog
)
)
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
df.show()
然后可以在数据框上使用任何方法.例如:
You can then use whatever method on your dataframe. Ex :
df.where(df("age") === 20)
这篇关于使用Spark读取带有where子句的HBase表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!