使用Ambari 2.2.2.0安装的HDP-2.4.2.0-258

我必须导入几个应该可以通过Hive,Pig,MR和任何第三方(将来使用)访问的SQL Server模式。我决定导入HCatalog。

Sqoop提供了导入Hive或HCatalog的方法,我想如果我导入HCatalog,则可以从Hive CLI,MR和Pig访问同一表(请评估我的假设)。

问题:

  • 如果直接导入到Hive,该表将对Pig,MR可用
  • 如果导入到HCatalog,通过Hive访问需要做什么?
  • 是否需要在Hive中预先创建表?如果是,在HCatalog中导入(我可以直接在Hive中导入)/(在HDFS中导入然后创建外部表)有什么好处?

  • 问题:
    我希望一步实现以下目标:
  • 导入数据(从SQL Server表中)
  • 避免为这些表(其中有100个)“预创建”或编写create语句
  • 以ORC格式存储表
  • 将此数据存储在自定义HDFS路径中,例如/ org / data / schema1,/ org / data / schema2等(这是不可能的,因为Sqoop会说(--target-dir /-warehouse-dir)

  • 我执行了以下命令:
    -bash-4.2$ sqoop import --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password  --table SettingAttribute -- --schema Administration  --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database default --hcatalog-table SettingAttribute --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
    

    源表包含109条记录,这些记录已获取:
    16/08/10 15:02:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
    16/08/10 15:02:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    16/08/10 15:02:28 INFO manager.SqlManager: Using default fetchSize of 1000
    16/08/10 15:02:28 INFO manager.SQLServerManager: We will use schema Administration
    16/08/10 15:02:28 INFO tool.CodeGenTool: Beginning code generation
    16/08/10 15:02:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration].[SettingAttribute] AS t WHERE 1=0
    16/08/10 15:02:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce
    Note: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    16/08/10 15:02:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.jar
    16/08/10 15:02:30 INFO mapreduce.ImportJobBase: Beginning import of SettingAttribute
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    16/08/10 15:02:31 INFO impl.TimelineClientImpl: Timeline service address: http://l4373t.sss.com:8188/ws/v1/timeline/
    16/08/10 15:02:31 INFO client.RMProxy: Connecting to ResourceManager at l4283t.sss.com/138.106.9.80:8050
    16/08/10 15:02:33 INFO db.DBInputFormat: Using read commited transaction isolation
    16/08/10 15:02:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([SettingAttributeId]), MAX([SettingAttributeId]) FROM [Administration].[SettingAttribute]
    16/08/10 15:02:33 INFO mapreduce.JobSubmitter: number of splits:4
    16/08/10 15:02:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467787344827_0013
    16/08/10 15:02:34 INFO impl.YarnClientImpl: Submitted application application_1467787344827_0013
    16/08/10 15:02:34 INFO mapreduce.Job: The url to track the job: http://l4283t.sss.com:8088/proxy/application_1467787344827_0013/
    16/08/10 15:02:34 INFO mapreduce.Job: Running job: job_1467787344827_0013
    16/08/10 15:02:41 INFO mapreduce.Job: Job job_1467787344827_0013 running in uber mode : false
    16/08/10 15:02:41 INFO mapreduce.Job:  map 0% reduce 0%
    16/08/10 15:02:47 INFO mapreduce.Job:  map 100% reduce 0%
    16/08/10 15:02:48 INFO mapreduce.Job: Job job_1467787344827_0013 completed successfully
    16/08/10 15:02:48 INFO mapreduce.Job: Counters: 30
            File System Counters
                    FILE: Number of bytes read=0
                    FILE: Number of bytes written=616636
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=540
                    HDFS: Number of bytes written=10079
                    HDFS: Number of read operations=16
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=8
            Job Counters
                    Launched map tasks=4
                    Other local map tasks=4
                    Total time spent by all maps in occupied slots (ms)=16132
                    Total time spent by all reduces in occupied slots (ms)=0
                    Total time spent by all map tasks (ms)=16132
                    Total vcore-seconds taken by all map tasks=16132
                    Total megabyte-seconds taken by all map tasks=66076672
            Map-Reduce Framework
                    Map input records=109
                    Map output records=109
                    Input split bytes=540
                    Spilled Records=0
                    Failed Shuffles=0
                    Merged Map outputs=0
                    GC time elapsed (ms)=320
                    CPU time spent (ms)=6340
                    Physical memory (bytes) snapshot=999870464
                    Virtual memory (bytes) snapshot=21872697344
                    Total committed heap usage (bytes)=943194112
            File Input Format Counters
                    Bytes Read=0
            File Output Format Counters
                    Bytes Written=10079
    16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Transferred 9.8428 KB in 17.2115 seconds (585.597 bytes/sec)
    16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Retrieved 109 records.
    

    这些文件是在我的用户下创建的:
    hdfs dfs -ls /user/ojoqcu/SettingAttribute
    Found 5 items
    -rw-------   3 ojoqcu hdfs          0 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/_SUCCESS
    -rw-------   3 ojoqcu hdfs       8378 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00000
    -rw-------   3 ojoqcu hdfs        144 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00001
    -rw-------   3 ojoqcu hdfs       1123 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00002
    -rw-------   3 ojoqcu hdfs        434 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00003
    

    我在HCatalog中看不到任何内容(在Hive中也看不到)
    -bash-4.2$ /usr/hdp/2.4.2.0-258/hive-hcatalog/bin/hcat -e "show tables in default;"
    WARNING: Use "yarn jar" to launch YARN applications.
    16/08/10 15:07:12 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
    OK
    Time taken: 2.007 seconds
    

    是否存在一些授权问题?

    我检查了var / log,但Sqoop,Hive-Hcatalog和Hive都不存在,如何查看授权问题并进行修复?

    最佳答案

    好吧,我不确定这是授权问题还是仅是解析问题,或者两者都是。我做了以下工作,它的工作:

  • 做了一个su hive
  • 执行以下命令(可能,-- --schema应该为
    最后一个参数,Sqoop在此之后只会忽略/中断!)
    sqoop import --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password --table DatabaseLog -- --schema ape
  • 08-25 04:26