执行hbase扫描时出现异常

执行hbase扫描时出现异常

本文介绍了执行hbase扫描时出现异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试。



我的简单代码如下所示:

  public class DistributedHBaseScanToRddDemo { 

public static void main(String [] args){
JavaSparkContext jsc = getJavaSparkContext(hbasetable1);
配置hbaseConf = getHbaseConf(0,,);
JavaHBaseContext javaHbaseContext = new JavaHBaseContext(jsc,hbaseConf);

扫描扫描=新扫描();
scan.setCaching(100);

JavaRDD< Tuple2< ImmutableBytesWritable,Result>> javaRdd =
javaHbaseContext.hbaseRDD(TableName.valueOf(hbasetable1),scan);

列表< String> results = javaRdd.map(new ScanConvertFunction())。collect();
System.out.println(Result Size:+ results.size());

$ b $ public static Configuration getHbaseConf(int pRimeout,String pQuorumIP,String pClientPort)
{
配置hbaseConf = HBaseConfiguration.create();
hbaseConf.setInt(timeout,120000);
hbaseConf.set(hbase.zookeeper.quorum,10.56.36.14);
hbaseConf.set(hbase.zookeeper.property.clientPort,2181);
返回hbaseConf;

$ b $ public static JavaSparkContext getJavaSparkContext(String pTableName)
{
SparkConf sparkConf = new SparkConf()。setAppName(JavaHBaseBulkPut+ pTableName);
sparkConf.setMaster(local);
sparkConf.set(spark.testing.memory,471859200);
JavaSparkContext jsc = new JavaSparkContext(sparkConf);

return jsc;
}

private static class ScanConvertFunction implements Function< Tuple2< ImmutableBytesWritable,Result>,String> {
public String call(Tuple2< ImmutableBytesWritable,Result> v1)throws Exception {
return Bytes.toString(v1._1()。copyBytes());
}
}
}

我收到以下异常:

 线程main中的异常org.apache.hadoop.hbase.DoNotRetryIOException:/10.56.48.219:16020无法读取从客户端10.56.49.148调用参数; java.lang.UnsupportedOperationException:在sun.reflect.NativeConstructorAccessorImpl.newInstance0(本地方法)中获取
在sun.reflect.NativeConstructorAccessorImpl.newInstance处的
(NativeConstructorAccessorImpl.java:62)$ b $在sun.reflect .DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException (RemoteWithExtrasException.java:93)
at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:83)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil .makeIOExceptionOfException(ProtobufUtil.java:368)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:345)
at org.apache.hadoop.hbase.shaded .protobuf.ProtobufUtil.getRegionLoad(ProtobufUtil.java:1746)
at org.apache.hadoop.hbase.client.HBaseAdmin.getRegionLoad(HBaseAdmin.java:2089)
at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.init(RegionSizeCalculator.java:82)
at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator。< init>(RegionSizeCalculator.java:60)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.oneInputSplitPerRegion(TableInputFormatBase.java :293)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:257)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java :254)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:121)
at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD .scala:248)
at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121 )$ or
org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org .apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark .rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:246)
在scala.Option.getOrElse(Option.scala:121)
在org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
在org.apache.spark.SparkContext .runJob(SparkContext.scala:1911)
在org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1.apply(RDD.scala:893)
at org.apache.spark.rdd .RDDOperationScope $ .withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd .RDDOperationScope $ .withScope(RDDOperationScope.scala:112)
在org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
在org.apache.spark.rdd.RDD。在org.apache.spark.api.java上收集(RDD.scala:892)
在org.apache.spark.api.java.JavaRDDLike $ class.collect(JavaRDDLike.scala:360)
。 AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at com.myproj.poc.sparkhbaseneo4j.DistributedHBaseScanToRddDemo.main(DistributedHBaseScanToRddDemo.java:32)
引起:org.apache.hadoop.hbase.ipc。 RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):/10.56.48.219:16020无法读取来自客户端10.56.49.148的调用参数; java.lang.UnsupportedOperationException:GetRegionLoad
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:387)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access $ 100(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient $ 3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient $ 3.run(AbstractRpcClient.java:406)
在org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
在org.apache.hadoop.hbase.ipc。 Call.setException(Call.java:118)
at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:161)
at org.apache.hadoop.hbase.ipc。 NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:191)
在org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
在org.apache。 hadoop.hbase.shaded.io.netty.chann el.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at org。 apache.hadoop.hbase.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at org.apache.hadoop.hbase.shaded.io.netty.handler.codec.ByteToMessageDecoder .channelRead(ByteToMessageDecoder.java:284)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at org.apache.hadoop .hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java: 340)
在org.apache.hadoop.hbase.shaded.io.netty.handl er.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead (AbstractChannelHandlerContext.java:340)
at org.apache.hadoop.hbase.shaded.io.netty.channel.DefaultChannelPipeline $ HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at org.apache.hadoop .hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java: 348)
,位于org.apache.hadoop.hbase.shaded.io.netty.ch annel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.AbstractNioByteChannel $ NioByteUnsafe.read(AbstractNioByteChannel.java:134)
在org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
在org.apache.hadoop.hbase.shaded.io.netty.channel .nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
在org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
位于org.apache.hadoop.hbase.shaded.io.netty.util。 concurrent.SingleThreadEventExecutor $ 5.run(SingleThreadEventExecutor.java:858)
at org.apache.hadoop.hbase.shaded.io.netty.util.concurrent.DefaultThreadFactory $ DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)$ b $ java.util.Thre ad.run(Thread.java:745)

我也试过bulk 和例子,它们工作正常。因此,我猜测大容量扫描示例出了什么问题。

解决方案

Cloudera hbase-spark连接器似乎工作正常:





所以,在pom.xml中添加这样的内容:

 < repositories> 
< repository>
< id> cloudera< / id>
< name> cloudera< / name>
< url> https://repository.cloudera.com/content/repositories/releases/< / url>
< / repository>
< / repositories>

以及依赖项:

 < dependency> 
< groupId> org.apache.hbase< / groupId>
< artifactId> hbase-spark< / artifactId>
< version> $ {hbase-spark.version}< / version>
< /依赖关系>

我注意到的一件事是,这个功能似乎并没有重用HBase连接,并尝试为每个分区重新建立它。

为此,我实际上避免了这种情况功能,但很想知道你的经验。


I was trying out hbase spark distributed scan example.

My simple code looks like this:

public class DistributedHBaseScanToRddDemo {

    public static void main(String[] args) {
        JavaSparkContext jsc = getJavaSparkContext("hbasetable1");
        Configuration hbaseConf = getHbaseConf(0, "", "");
        JavaHBaseContext javaHbaseContext = new JavaHBaseContext(jsc, hbaseConf);

        Scan scan = new Scan();
        scan.setCaching(100);

        JavaRDD<Tuple2<ImmutableBytesWritable, Result>> javaRdd =
                  javaHbaseContext.hbaseRDD(TableName.valueOf("hbasetable1"), scan);

        List<String> results = javaRdd.map(new ScanConvertFunction()).collect();
        System.out.println("Result Size: " + results.size());
    }

    public static Configuration getHbaseConf(int pRimeout, String pQuorumIP, String pClientPort)
    {
        Configuration hbaseConf = HBaseConfiguration.create();
        hbaseConf.setInt("timeout", 120000);
        hbaseConf.set("hbase.zookeeper.quorum", "10.56.36.14");
        hbaseConf.set("hbase.zookeeper.property.clientPort", "2181");
        return hbaseConf;
    }

    public static JavaSparkContext getJavaSparkContext(String pTableName)
    {
        SparkConf sparkConf = new SparkConf().setAppName("JavaHBaseBulkPut" + pTableName);
        sparkConf.setMaster("local");
        sparkConf.set("spark.testing.memory", "471859200");
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);

        return jsc;
    }

    private static class ScanConvertFunction implements Function<Tuple2<ImmutableBytesWritable, Result>, String> {
        public String call(Tuple2<ImmutableBytesWritable, Result> v1) throws Exception {
            return Bytes.toString(v1._1().copyBytes());
        }
    }
}

I am getting following exception:

Exception in thread "main" org.apache.hadoop.hbase.DoNotRetryIOException: /10.56.48.219:16020 is unable to read call parameter from client 10.56.49.148; java.lang.UnsupportedOperationException: GetRegionLoad
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:93)
    at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:83)
    at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:368)
    at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:345)
    at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.getRegionLoad(ProtobufUtil.java:1746)
    at org.apache.hadoop.hbase.client.HBaseAdmin.getRegionLoad(HBaseAdmin.java:2089)
    at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.init(RegionSizeCalculator.java:82)
    at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.<init>(RegionSizeCalculator.java:60)
    at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.oneInputSplitPerRegion(TableInputFormatBase.java:293)
    at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:257)
    at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:254)
    at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:121)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:892)
    at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:360)
    at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
    at com.myproj.poc.sparkhbaseneo4j.DistributedHBaseScanToRddDemo.main(DistributedHBaseScanToRddDemo.java:32)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): /10.56.48.219:16020 is unable to read call parameter from client 10.56.49.148; java.lang.UnsupportedOperationException: GetRegionLoad
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:387)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
    at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
    at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
    at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:161)
    at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:191)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at org.apache.hadoop.hbase.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at org.apache.hadoop.hbase.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at org.apache.hadoop.hbase.shaded.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
    at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at org.apache.hadoop.hbase.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at org.apache.hadoop.hbase.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:745)

I also tried bulk get and put examples and they are working correctly. So I was guessing whats going wrong with bulk scan example.

解决方案

This Cloudera hbase-spark connector seems to work:

https://mvnrepository.com/artifact/org.apache.hbase/hbase-spark?repo=cloudera

So, add something like this in pom.xml:

 <repositories>
    <repository>
      <id>cloudera</id>
      <name>cloudera</name>
      <url>https://repository.cloudera.com/content/repositories/releases/</url>
    </repository>
  </repositories>

and for dependencies:

 <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-spark</artifactId>
      <version>${hbase-spark.version}</version>
    </dependency>

One thing I noticed is that this functionality doesn't seem to reuse the HBase connection well and tries to re-establish it for every partition. See my question and related discussion here:

HBase-Spark Connector: connection to HBase established for every scan?

For this reason I actually avoid this functionality, but curious to know your experience with this.

这篇关于执行hbase扫描时出现异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 09:50