本文介绍了无法连接到Bigtable以扫描HTable数据,原因是hbase客户端jar中的managed code = true的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用自定义加载函数来使用Pig on Dataproc从Bigtable加载数据。我使用以下Dataproc抓取的jar文件列表编译我的java代码。当我运行下面的Pig脚本时,它会尝试与Bigtable建立连接时失败。



错误消息是:

  Bigtable不支持托管连接。 

问题:


  1. 是否有解决此问题的解决方法?

  2. 这是一个已知问题,是否有计划修复或调整?

  3. 有多种方法可以实现多扫描作为Pig的加载函数,可以与Bigtable一起使用?
  4. b
    $ b

    Jar文件:

      hadoop-common-2.7.3.jar 
    hbase-client-1.2.2.jar
    hbase-common-1.2.2.jar
    hbase-protocol-1.2.2.jar
    hbase-server-1.2.2.jar
    pig-0.16.0-core-h2.jar

    这是一个简单的Pig脚本,使用我的自定义加载功能:

     %default gte'2017-03-23T18:00Z'
    %default lt'2017-03 -23T18:05Z'
    %default SHARD_FIRST'00'
    %default SHARD_LAST'25'
    %default GTE_SHARD'$ gte\_ $ SHARD_FIRST'
    %default LT_SHARD'$ lt \_ $ SHARD_LAST'
    raw = LOAD'hbase:// ev ents_sessions'
    USING com.eduboom.pig.load.HBaseMultiScanLoader('$ GTE_SHARD','$ LT_SHARD','event:*')
    AS(es_key:chararray,event_array);
    DUMP raw;

    我的自定义加载函数HBaseMultiScanLoader创建一个Scan对象列表,以便在不同范围的数据上执行多次扫描表events_sessions由gte和lt之间的时间范围确定,SHARD_FIRST通过SHARD_LAST分割。



    HBaseMultiScanLoader扩展org.apache.pig.LoadFunc,因此可用于猪脚本作为加载函数。
    当Pig运行我的脚本时,它调用LoadFunc.getInputFormat()。
    我的getInputFormat()实现返回了我的自定义类MultiScanTableInputFormat的一个实例,它扩展了org.apache.hadoop.mapreduce.InputFormat。
    MultiScanTableInputFormat初始化org.apache.hadoop.hbase.client.HTable对象以初始化与表的连接。



    挖掘hbase-client源代码,我看到org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal()调用org.apache.hadoop.hbase.client.ConnectionManager.createConnection(),将属性managed硬编码为true。
    您可以从下面的堆栈轨迹中看到,我的代码(MultiScanTableInputFormat)尝试初始化一个调用getConnectionInternal()的HTable对象,该对象没有提供将managed设置为false的选项。
    进入堆栈跟踪,您将进入AbstractBigtableConnection,它不接受managed = true,因此导致与Bigtable的连接失败。



    这里是堆栈跟踪显示错误:

      2017-03-24 23:06:44,890 [JobControl]错误com.turner.hbase.mapreduce .MultiScanTableInputFormat  -  java.io.IOException:java.lang.reflect.InvocationTargetException $ b $在org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
    在org.apache。 hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
    位于org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:424)
    位于org.apache。 hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:302)$ or
    at org.apache.hadoop.hbase.client.HTable。< init>(HTable.java:185)
    at org .apache.hadoop.hbase.client.HTable<初始化>(HTable.jav a:151)
    at com.eduboom.hbase.mapreduce.MultiScanTableInputFormat.setConf(Unknown Source)
    at com.eduboom.pig.load.HBaseMultiScanLoader.getInputFormat(Unknown Source)
    at org .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:264)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    在org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache .hadoop.mapreduce.Job $ 10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job $ 10.run(Job.java:1287)$ b $ at java.security.AccessController .doPrivileged(Native方法)
    位于javax.security.auth.Subject.doAs(Subject.java:422)
    位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    在org.apache.hadoop.mapreduce.Job.submit(Job.java :1287)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method .java:498)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl .java:194)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher $ 1.run(MapReduceLauncher。 java:276)
    导致:java.lang.reflect.InvocationTargetException $ b $在sun.reflect.NativeConstructorAccessorImpl.newInstance0(本地方法)
    在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruc
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    在org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    ... 26 more
    引起:java.lang.IllegalArgumentException:Bigtable不支持托管连接。
    at org.apache.hadoop.hbase.client.AbstractBigtableConnection。< init>(AbstractBigtableConnection.java:123)
    at com.google.cloud.bigtable.hbase1_2.BigtableConnection。< init>( BigtableConnection.java:55)
    ... 31 more


    解决方案

    原始问题是由于使用过时和弃用的hbase客户端jar和类引起的。

    我更新了我的代码,以使用最新的hbase客户端jar包Google和原始问题已修复。



    我仍然遇到一些ZK问题,但我仍然没有弄清楚,但这是针对另一个问题的对话。 p>

    这个答案已经得到解答!

    I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following Pig script, it fails when it tries to establish a connection with Bigtable.

    Error message is:

    Bigtable does not support managed connections.
    

    Questions:

    1. Is there a work around for this problem?
    2. Is this a known issue and is there a plan to fix or adjust?
    3. Is there a different way of implementing multi scans as a load function for Pig that will work with Bigtable?

    Details:

    Jar files:

    hadoop-common-2.7.3.jar
    hbase-client-1.2.2.jar
    hbase-common-1.2.2.jar
    hbase-protocol-1.2.2.jar
    hbase-server-1.2.2.jar
    pig-0.16.0-core-h2.jar
    

    Here's a simple Pig script using my custom load function:

    %default gte         '2017-03-23T18:00Z'
    %default lt          '2017-03-23T18:05Z'
    %default SHARD_FIRST '00'
    %default SHARD_LAST  '25'
    %default GTE_SHARD   '$gte\_$SHARD_FIRST'
    %default LT_SHARD    '$lt\_$SHARD_LAST'
    raw = LOAD 'hbase://events_sessions'
          USING com.eduboom.pig.load.HBaseMultiScanLoader('$GTE_SHARD', '$LT_SHARD', 'event:*')
          AS (es_key:chararray, event_array);
    DUMP raw;
    

    My custom load function HBaseMultiScanLoader creates a list of Scan objects to perform multiple scans on different ranges of data in the table events_sessions determined by the time range between gte and lt and sharded by SHARD_FIRST through SHARD_LAST.

    HBaseMultiScanLoader extends org.apache.pig.LoadFunc so it can be used in the Pig script as load function.When Pig runs my script, it calls LoadFunc.getInputFormat().My implementation of getInputFormat() returns an instance of my custom class MultiScanTableInputFormat which extends org.apache.hadoop.mapreduce.InputFormat.MultiScanTableInputFormat initializes org.apache.hadoop.hbase.client.HTable object to initialize the connection to the table.

    Digging into the hbase-client source code, I see that org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal() calls org.apache.hadoop.hbase.client.ConnectionManager.createConnection() with the attribute "managed" hardcoded to "true".You can see from the stack track below that my code (MultiScanTableInputFormat) tries to initialize an HTable object which invokes getConnectionInternal() which does not provide an option to set managed to false.Going down the stack trace, you will get to AbstractBigtableConnection that will not accept managed=true and therefore cause the connection to Bigtable to fail.

    Here’s the stack trace showing the error:

    2017-03-24 23:06:44,890 [JobControl] ERROR com.turner.hbase.mapreduce.MultiScanTableInputFormat - java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
        at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:424)
        at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:302)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:185)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:151)
        at com.eduboom.hbase.mapreduce.MultiScanTableInputFormat.setConf(Unknown Source)
        at com.eduboom.pig.load.HBaseMultiScanLoader.getInputFormat(Unknown Source)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:264)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
        at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
    Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
        ... 26 more
    Caused by: java.lang.IllegalArgumentException: Bigtable does not support managed connections.
        at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:123)
        at com.google.cloud.bigtable.hbase1_2.BigtableConnection.<init>(BigtableConnection.java:55)
        ... 31 more
    
    解决方案

    The original problem was caused by the use of outdated and deprecated hbase client jars and classes.

    I updated my code to use the newest hbase client jars provided by Google and the original problem was fixed.

    I still get stuck with some ZK issue that I still did not figure out, but that's a conversation for a different question.

    This one is answered!

    这篇关于无法连接到Bigtable以扫描HTable数据,原因是hbase客户端jar中的managed code = true的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 15:37