我正在尝试使用Apache Drill 1.3.0 获取数字字段上的合计值,而不在文件系统上安装 HadoopApache Drill

现在的问题是-在读取CSV文件时我想省略标题,但是我找不到任何告诉Apache Drill的选项,请使用Java程序skipFirstLine,因为我尚未安装Apache Drill

程序因错误而终止,并由于列名而引发NumberFormatException

pom.xml

<dependency>
<groupId>org.apache.drill.exec</groupId>
<artifactId>drill-jdbc</artifactId>
<version>1.3.0</version>
</dependency>

样本CSV文件:

“ffe92de2-e633-4741-bc4b-83e91dce665a.csv”
"A"
"100"

示例Java程序:
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;

/*
* Created By: Ashish Pancholi
* On: 16-12-2015
* */

public class DrillExample {
    private static final String DRILL_JDBC_LOCAL_URI = "jdbc:drill:zk=local";
    private Connection con = null;

    public DrillExample() throws SQLException {
        con = new org.apache.drill.jdbc.Driver().connect(DRILL_JDBC_LOCAL_URI, getDefaultProperties());

    }

    public int getArchiveNumFieldSUM(String filepath) throws Exception {
        //query to be executed
        String query = "select sum(cast(columns[0] as int)) from dfs.`"+filepath+"`";
        try {
            // creating statement
            Statement stmt = con.createStatement();
            // executing query
            ResultSet rs = stmt.executeQuery(query);
            // get the number of rows from the result set
            rs.next();
            return rs.getInt(1);
        } catch (Exception ex) {
            throw ex;
        }
    }

    private Properties getDefaultProperties() {
        final Properties properties = new Properties();
        properties.setProperty(org.apache.drill.exec.ExecConstants.HTTP_ENABLE, "false");
        return properties;
    }

    public void shutdownApacheDrill(){
        if (con != null) {
            try {
                con.close();
            } catch (SQLException e) {
                //ignore
            }
        }
    }

    public static void main(String[] arg){
        try {
            int count = 0;
            DrillExample local = new DrillExample();
            String filePath = "C:/Workarea/Project/xyz/Structured Data/ffe92de2-e633-4741-bc4b-83e91dce665a.csv";
            try {
                count = local.getArchiveNumFieldSUM(filePath);
            } catch (Exception e) {
                e.printStackTrace();
            }
            System.out.println(count);
        }catch(Exception ex){
            ex.printStackTrace();
        }
    }

}

我遇到以下错误:
SYSTEM ERROR: NumberFormatException: A

Fragment 0:0

[Error Id: 72dec3db-8f28-462d-838d-dcb60663e389 on ashish:31010].
java.sql.SQLException: SYSTEM ERROR: NumberFormatException: A

Fragment 0:0

[Error Id: 72dec3db-8f28-462d-838d-dcb60663e389 on ashish:31010]
    at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
    at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
    at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1359)
    at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:74)
    at net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
    at net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
    at net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78)
    at org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:97)
    at org.openskye.core.structured.DrillExample.getArchiveNumFieldSUM(DrillExample.java:30)
    at org.openskye.core.structured.DrillExample.main(DrillExample.java:61)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:130)
Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException: A

Fragment 0:0

[Error Id: 72dec3db-8f28-462d-838d-dcb60663e389 on ashish:31010]
    at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
    at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112)
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
    at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
    at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
    at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
    at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
    at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
20:07:26.097 [USER-rpc-event-queue] INFO  o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#1] Query failed:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException: A

Fragment 0:0

[Error Id: 72dec3db-8f28-462d-838d-dcb60663e389 on ashish:31010]
    at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) [drill-common-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142) [drill-common-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298) [drill-java-exec-1.3.0.jar:1.3.0]
    at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269) [drill-java-exec-1.3.0.jar:1.3.0]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) [netty-handler-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) [netty-codec-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final]
    at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:744)
20:07:26.105 [USER-rpc-event-queue] DEBUG o.a.drill.exec.rpc.user.UserClient - Sending response with Sender 1145498900
0

更新:

我找到了一个设置SkipFirstLine属性的类,但我无法理解必须将其传递给哪里?
TextParsingSettings textParsingSettings = new TextParsingSettings();
textParsingSettings.setSkipFirstLine(true);

最佳答案

现在可以使用Drill 1.4编写更改存储插件配置选项的查询(请检查Release Notes)

您可以修改查询:

String query = "SELECT sum(cast(A as int)) FROM TABLE(dfs.`"+filepath+"`(type => 'text', fieldDelimiter => '^', extractHeader => true))";

注意:我正在使用强制转换(以int表示)A 而不是强制转换(以int表示的columns [0])

提取 header 后,您需要使用 header 中提到的列名而不是column []来提取列名。

关于java - 使用Apache Drill 1.3.0读取CSV文件但不安装Apache Drill时如何省略标题?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34337863/

10-10 03:14