我们有大量的 hive 单元测试,它们在hadoop minicluster中运行。问题是它们按顺序运行,每个构建大约需要一个小时才能完成。我们想通过使用与Zookeeper负载均衡的多个hive server2并行化hive单元测试。

当使用连接字符串“ jdbc:hive2:// localhost:20103 / default ”直接连接到hiveserver2实例时,它将按预期工作。但是,当使用连接字符串“ jdbc:hive2:// localhost:22010 / default; serviceDiscoveryMode = zooKeeper; zooKeeperNamespace = hiveserver2 ”连接到zookeeper时,它将失败,并出现以下错误。

hadoop minicluster中的zookeeper是否能够进行负载平衡?

INFO: Connecting to : jdbc:hive2://localhost:22010/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

java.sql.SQLException: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read HiveServer2 configs from ZooKeeper

    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:135)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read HiveServer2 configs from ZooKeeper
    at org.apache.hive.jdbc.ZooKeeperHiveClientHelper.configureConnParams(ZooKeeperHiveClientHelper.java:80)
    at org.apache.hive.jdbc.Utils.configureConnParams(Utils.java:505)
    at org.apache.hive.jdbc.Utils.parseURL(Utils.java:425)
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:133)
    ... 29 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hiveserver2
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
    at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)
    at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
    at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:199)
    at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191)
    at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38)
    at org.apache.hive.jdbc.ZooKeeperHiveClientHelper.configureConnParams(ZooKeeperHiveClientHelper.java:63)
    ... 32 more

使用的版本
<hive.version>1.2.1000.2.4.0.0-169</hive.version>
<hadoop.version>2.7.1.2.4.0.0-169</hadoop.version>
<minicluster.version>0.1.14</minicluster.version>

服务器配置
public HiveServerRunner() {

    zookeeperLocalCluster = new ZookeeperLocalCluster.Builder()
      .setPort(22010)
      .setTempDir("embedded_zk")
      .setZookeeperConnectionString("127.0.0.1:22010")
      .setDeleteDataDirectoryOnClose(true)
      .build();

    hiveLocalMetaStore = new HiveLocalMetaStore.Builder()
      .setHiveMetastoreHostname("localhost")
      .setHiveMetastorePort(20102)
      .setHiveMetastoreDerbyDbDir("metastore_db")
      .setHiveScratchDir("hive_scratch_dir")
      .setHiveWarehouseDir("warehouse_dir")
      .setHiveConf(buildHiveConf())
      .build();

    hiveLocalServer2 = new HiveLocalServer2.Builder()
      .setHiveServer2Hostname("localhost")
      .setHiveServer2Port(20103)
      .setHiveMetastoreHostname("localhost")
      .setHiveMetastorePort(20102)
      .setHiveMetastoreDerbyDbDir("metastore_db")
      .setHiveScratchDir("hive_scratch_dir")
      .setHiveWarehouseDir("warehouse_dir")
      .setHiveConf(buildHiveConf())
      .setZookeeperConnectionString("127.0.0.1:22010")
      .build();
}

public static HiveConf buildHiveConf() {
    HiveConf hiveConf = new HiveConf();
    hiveConf.set("hive.txn.manager", "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager");
    hiveConf.set("hive.compactor.initiator.on", "true");
    hiveConf.set("hive.compactor.worker.threads", "5");
    hiveConf.set("hive.root.logger", "DEBUG,console");
    hiveConf.set("hadoop.bin.path", System.getenv("HADOOP_HOME") + "/bin/hadoop");
    hiveConf.set("hive.exec.submit.local.task.via.child", "false");
    hiveConf.set("hive.server2.support.dynamic.service.discovery", "true");
    hiveConf.set("hive.zookeeper.quorum", "127.0.0.1:22010");
    hiveConf.setIntVar("hive.metastore.connect.retries", 3);
    System.setProperty("HADOOP_HOME", WindowsLibsUtils.getHadoopHome());
    return hiveConf;
}

最佳答案

看起来Zookeeper确实在进行负载平衡,但是它将客户端的请求定向到可用的随机 HS2。

有关更多详细信息,请参见下面的链接

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_hadoop-high-availability/content/ha-hs2-service-discovery.html

关于unit-testing - 使用Zookeeper在hadoop minicluster中负载均衡hiveserver2,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50537948/

10-16 11:43