我们正在尝试使用Putty在独立的Java主程序(Maven阴影可执行文件)中从Accumulo(客户端jar 1.5.0)运行简单的write / sacn,如下所示,在AWS EC2 master中(如下所述)

    public class AccumuloQueryApp {

      private static final Logger logger = LoggerFactory.getLogger(AccumuloQueryApp.class);

      public static final String INSTANCE = "accumulo"; // miniInstance
      public static final String ZOOKEEPERS = "ip-x-x-x-100:2181"; //localhost:28076

      private static Connector conn;

      static {
        // Accumulo
        Instance instance = new ZooKeeperInstance(INSTANCE, ZOOKEEPERS);
        try {
          conn = instance.getConnector("root", new PasswordToken("xxx"));
        } catch (Exception e) {
          logger.error("Connection", e);
        }
      }

      public static void main(String[] args) throws TableNotFoundException, AccumuloException, AccumuloSecurityException, TableExistsException {
        System.out.println("connection with : " + conn.whoami());

        BatchWriter writer = conn.createBatchWriter("test", ofBatchWriter());

        for (int i = 0; i < 10; i++) {
          Mutation m1 = new Mutation(String.valueOf(i));
          m1.put("personal_info", "first_name", String.valueOf(i));
          m1.put("personal_info", "last_name", String.valueOf(i));
          m1.put("personal_info", "phone", "983065281" + i % 2);
          m1.put("personal_info", "email", String.valueOf(i));
          m1.put("personal_info", "date_of_birth", String.valueOf(i));
          m1.put("department_info", "id", String.valueOf(i));
          m1.put("department_info", "short_name", String.valueOf(i));
          m1.put("department_info", "full_name", String.valueOf(i));
          m1.put("organization_info", "id", String.valueOf(i));
          m1.put("organization_info", "short_name", String.valueOf(i));
          m1.put("organization_info", "full_name", String.valueOf(i));

          writer.addMutation(m1);
        }
        writer.close();

        System.out.println("Writing complete ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");

        Scanner scanner = conn.createScanner("test", new Authorizations());
        System.out.println("Step 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
        scanner.setRange(new Range("3", "7"));
        System.out.println("Step 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
        scanner.forEach(e -> System.out.println("Key: " + e.getKey() + ", Value: " + e.getValue()));
        System.out.println("Step 3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
        scanner.close();
      }

      public static BatchWriterConfig ofBatchWriter() {
        //Batch Writer Properties
        final int MAX_LATENCY  = 1;
        final int MAX_MEMORY = 10000000;
        final int MAX_WRITE_THREADS = 10;
        final int TIMEOUT = 10;

        BatchWriterConfig config = new BatchWriterConfig();
        config.setMaxLatency(MAX_LATENCY, TimeUnit.MINUTES);
        config.setMaxMemory(MAX_MEMORY);
        config.setMaxWriteThreads(MAX_WRITE_THREADS);
        config.setTimeout(TIMEOUT, TimeUnit.MINUTES);

        return config;
      }
    }


连接已正确建立,但是创建BatchWriter时出现错误,并且尝试以相同错误循环

[impl.ThriftScanner] DEBUG: Error getting transport to ip-x-x-x-100:10011 : NotServingTabletException(extent:TKeyExtent(table:21 30, endRow:21 30 3C, prevEndRow:null))


当我们在Spark作业中运行相同的代码(写入Accumulo并从Accumulo读取)并将其提交给YANK集群时,它运行良好。我们正在努力弄清这一点,但没有任何线索。请按照以下说明查看环境

AWS环境上的Cloudera CDH 5.8.2(4个EC2实例作为一个主实例和3个子实例)。

考虑私有IP就像


材料:x.x.x.100
子1:x.x.x.101
子2:x.x.x.102
子3:x.x.x.103


我们在CDH中有以下安装

群集(CDH 5.8.2)


Accumulo 1.6(未安装Tracer,Child2中为垃圾收集器,Master中为Master,Child3中为Monitor,Master中为Tablet Server)
HBase的
HDFS(主节点作为名称节点,所有3个子节点作为数据节点)
卡夫卡
火花
纱(包括MR2)
动物园管理员

最佳答案

嗯,这很好奇,它可以与Spark-on-YARN一起运行,但可以作为常规Java应用程序运行。通常,这是另一种方式:)

我将验证独立Java应用程序的类路径上的JAR是否与Spark-on-YARN作业以及Accumulo服务器类路径使用的JAR相匹配。

如果那没有帮助,请尝试将log4j级别提高到DEBUG或TRACE,然后查看是否有任何意外出现。如果您很难理解日志记录的含义,请随时发送电子邮件至[email protected],您一定会更加关注此问题。

10-07 23:32