本文介绍了如何过滤Spark Hbase Rdd并获取结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用spark和hbase获取Rdd。现在我想过滤该rdd并从该Rdd获取特定值。



这是我迄今为止所做的工作

  val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE,tbl_date)
val hBaseRDD = sc.newAPIHadoopRDD (conf,classOf [TableInputFormat],
classOf [ImmutableBytesWritable],
classOf [Result])

现在我想使用RDD(hBaseRDD)并通过向RDD发送特定参数来获取特定的列数据。

解决方案



 val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE,tbl_date)
val hBaseRDD = sc.newAPIHadoopRDD(conf,classOf [TableInputFormat],
classOf [ImmutableBytesWritable],
classOf [Result])

添加以下内容:

  val localData = hbaseRDD.collect()// This是结果数组
val filteredData = localData.map {result =>
result.getColumnCells(MyColFamily,MyColName)。get(0)//假设你想要第一个单元格:否则
//你也可以把它们全部放在一起..
} .filter {cell =>新字符串(cell.getValueArray())。startswtih(SomePrefix)}

/ dummy函数:


  • get(0)您需要决定是否只需要第一个单元格或所有单元格

  • new String(cell.getValueArray())您需要转换为正确的数据类型
  • .startsWith(..)您需要决定如何处理数据



但是在任何情况下,上述都会给出如何处理hbase单元格数据的流程和大纲。


I am getting Rdd using spark and hbase. Now i want to filter that rdd and get a specific value from that Rdd. How can i proceed with?

Here is what i have done up to now

val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])

Now i want to use that RDD(hBaseRDD) and get a specific column data by sending a specific parameter to the RDD. How can i achieve this?

解决方案

What you already have:

val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])

Add the following:

val localData = hbaseRDD.collect()  // This is array of Result
val filteredData = localData.map{ result =>
               result.getColumnCells("MyColFamily", "MyColName").get(0) // assuming you want first cell: otherwise
                                                       // you could also take all of them..
             }.filter{ cell => new String(cell.getValueArray()).startswtih("SomePrefix") }

The above shows placeholder/dummy functions for :

  • get(0) You need to decide if you want just first cell or all cells
  • new String(cell.getValueArray()) You need to convert to proper data type
  • .startsWith(..) You need to decide what to do with the data

But in any case the above gives you the flow and outline of how to process the hbase cell data.

这篇关于如何过滤Spark Hbase Rdd并获取结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 16:39