本文介绍了将Sqoop导入HCatalog / Hive - 表不可见的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Ambari 2.2.2.0安装HDP-2.4.2.0-258

我必须导入几个SQL Server架构,这些架构应该可以通过Hive,Pig,MR和任何第三方(将来)。我决定导入HCatalog。



Sqoop提供了导入到Hive或HCatalog的方法,我想如果导入到HCatalog,可以从Hive CLI访问同一个表,到MR和猪(请评估我的假设)。



问题:


  • 如果直接导入到Hive,表格是否可用于Pig,MR

  • 如果导入到HCatalog,需要通过Hive访问什么?

  • 是否需要在Hive中预先创建表?如果是,在HCatalog中导入的优点是什么(我可以直接在Hive中导入)/(在HDFS中导入,然后创建外部表)



问题
我希望一步完成以下操作:


  • 导入数据(从SQL Server表)

  • 避免为这些表创建语句或创建语句(其中有100个语句)
  • 以ORC格式存储表格

  • 将数据存储在自定义HDFS路径say / org / data / schema1,/ org / data / schema2等等(这是不可能的,因为Sqoop (--target-dir / - warehouse-dir)


我执行了以下命令:



pre $ -bash-4.2 $ sqoop import --connect'jdbc:sqlserver://< IP> ;; database = FleetManagement'--username --password --table SettingAttribute - --schema管理--hcatalog-home / usr / hdp / current / hive-webhcat --hcatalog-database default --hcatalog-table Se ttingAttribute --create-hcatalog-table --hcatalog-storage-stanza存储为orcfile

源表包含109条记录,并获取这些记录:

  16/08/10 15:02:27信息sqoop.Sqoop:Running Sqoop版本:1.4.6.2.4.2.0-258 
16/08/10 15:02:27 WARN tool.BaseSqoopTool:在命令行上设置密码是不安全的。考虑使用-P来代替。
16/08/10 15:02:28信息manager.SqlManager:使用默认的fetchSize 1000
16/08/10 15:02:28信息manager.SQLServerManager:我们将使用模式管理
16/08/10 15:02:28信息tool.CodeGenTool:开始代码生成
16/08/10 15:02:28信息manager.SqlManager:执行SQL语句:SELECT t。* FROM [Administration ]。[SettingAttribute] AS t WHERE 1 = 0
16/08/10 15:02:28 INFO orm.CompilationManager:HADOOP_MAPRED_HOME是/usr/hdp/2.4.2.0-258/hadoop-mapreduce
注意:/tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.java使用或覆盖弃用的API。
注意:使用-Xlint:deprecation重新编译以获取详细信息。
16/08/10 15:02:30 INFO orm.CompilationManager:编写jar文件:/tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.jar
16/08/10 15:02: 30 INFO mapreduce.ImportJobBase:开始导入SettingAttribute
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:在[jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:在[jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:请参阅http://www.slf4j.org/codes.html#multiple_bindings以获取解释。
SLF4J:实际绑定类型为[org.slf4j.impl.Log4jLoggerFactory] ​​
16/08/10 15:02:31 INFO impl.TimelineClientImpl:时间轴服务地址:http://l4373t.sss .com:8188 / ws / v1 / timeline /
16/08/10 15:02:31信息client.RMProxy:连接到ResourceManager at l4283t.sss.com/138.106.9.80:8050
16 / 08/10 15:02:33 INFO db.DBInputFormat:使用读提交事务隔离
16/08/10 15:02:33 INFO db.DataDrivenDBInputFormat:BoundingValsQuery:SELECT MIN([SettingAttributeId]),MAX( [SettingAttributeId])FROM [Administration]。[SettingAttribute]
16/08/10 15:02:33信息mapreduce.JobSubmitter:分割数量:4
16/08/10 15:02:33 INFO mapreduce.JobSubmitter:提交作业的标记:job_1467787344827_0013
16/08/10 15:02:34 INFO impl.YarnClientImpl:已提交的应用程序application_1467787344827_0013
16/08/10 15:02:34信息mapreduce。工作:跟踪工作的网址:http://l4283t.sss.com:8088/proxy/application_1467787344827_0013/
16/08/10 15:02:3 4信息mapreduce.Job:正在运行的作业:job_1467787344827_0013
16/08/10 15:02:41信息mapreduce.Job:作业job_1467787344827_0013以超级模式运行:false
16/08/10 15:02: 41信息mapreduce.Job:地图0%减少0%
16/08/10 15:02:47信息mapreduce.Job:地图100%减少0%
16/08/10 15:02: 48信息mapreduce.Job:Job job_1467787344827_0013已成功完成
16/08/10 15:02:48信息mapreduce.Job:计数器:30
文件系统计数器
FILE:读取的字节数= 0
FILE:写入的字节数= 616636
FILE:读取操作次数= 0
FILE:大量读取操作数量= 0
FILE:写入操作次数= 0
HDFS:读取的字节数= 540
HDFS:写入的字节数= 10079
HDFS:读取操作数量= 16
HDFS:大量读取操作的数量= 0
HDFS:Num写操作的数量= 8
作业计数器
启动的地图任务= 4
其他本地地图任务= 4
所有地图在占用槽位上花费的总时间(ms)= 16132
所有地图任务花费的总时间(ms)= 0
所有地图任务花费的总时间(ms)= 16132
所有地图任务花费的总时间= 16132
所有地图任务占用的总兆字节秒数= 66076672
Map-Reduce Framework
地图输入记录= 109
地图输出记录= 109
输入分割字节数= 540
溢出记录= 0
失败Shuffles = 0
合并映射输出= 0
GC耗时(ms)= 320
CPU时间(ms)= 6340
物理内存(字节)快照= 999870464
虚拟内存(字节)快照= 21872697344
总承诺堆使用率(字节)= 943194112
文件输入格式计数器
字节读= 0
文件输出格式计数器
写入的字节数= 10079
16/08/10 15:02:48信息mapreduce.ImportJobBase:转移9.8428 KB在17.2115秒(585.597字节/秒)
16/08/10 15:02:48信息mapreduce.ImportJobBase:检索到109条记录。

这些文件是在我的用户下创建的:

  hdfs dfs -ls / user / ojoqcu / SettingAttribute 
找到5项
-rw ------- 3 ojoqcu hdfs 0 2016-08-10 15:02 / user / ojoqcu / SettingAttribute / _SUCCESS
-rw ------- 3 ojoqcu hdfs 8378 2016-08-10 15:02 / user / ojoqcu / SettingAttribute / part-m-00000
-rw ------- 3 ojoqcu hdfs 144 2016-08-10 15:02 / user / ojoqcu / SettingAttribute / part-m-00001
-rw ------- 3 ojoqcu hdfs 1123 2016-08-10 15:02 / user / ojoqcu / SettingAttribute / part-m-00002
-rw ------- 3 ojoqcu hdfs 434 2016-08-10 15:02 / user / ojoqcu / SettingAttribute / part-m-00003

我在HCatalog(Hive)中看不到任何东西

  -bash-4.2 $ /usr/hdp/2.4.2.0-258/hive-hcatalog/bin/hcat -eshow默认表格; 
警告:使用yarn jar启动YARN应用程序。
16/08/10 15:07:12 WARN conf.HiveConf:名称hive.server2.enable.impersonation的HiveConf不存在
OK
占用的时间:2.007秒

是否存在授权问题?



我检查了var / log但没有Sqoop,Hive-Hcatalog和Hive存在,我如何查看授权问题并修复它?

解决方案

嗯,我不确定这是授权问题还是仅仅解析问题,或者两者兼而有之。我做了以下工作,它的工作原理如下:


  1. 完成了 su hive

  2. 执行下面的命令(可能, - --schema 应该是
    最后一个参数,Sqoop只是忽略/ break)!


    sqoop import --hcatalog-home / usr / hdp / current / hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create -hcatalog-table --hcatalog-storage-stanza以orcfile存储--connect'jdbc:sqlserver://< IP> ;; database = FleetManagement'--username --password --table DatabaseLog - --schema ape



HDP-2.4.2.0-258 installed using Ambari 2.2.2.0

I have to import several SQL Server schema which should be accessible via Hive, Pig, MR and any third party(in future). I decided to import in HCatalog.

Sqoop provides ways to import to Hive OR HCatalog, I guess if I import to HCatalog, the same table will be accessible from Hive CLI, to MR and to Pig(please evaluate my assumption).

Questions :

  • If imported to Hive directly, will the table be available to Pig, MR?
  • If imported to HCatalog, what needs to be done for accessing via Hive ?
  • Is pre-creation of tables in Hive necessary ? If yes, what is the advantage of the importing in HCatalog, (I can import in Hive directly)/(import in HDFS and then create external table) ?

Issue :I wish to achieve the following in one step :

  • Import of the data(from SQL Server tables)
  • Avoid 'pre-creating' or writing create statements for those tables(there are 100s of them)
  • Store the table in ORC format
  • Store this data at custom HDFS path say /org/data/schema1, /org/data/schema2 and so on(is this impossible because Sqoop says it (--target-dir/--warehouse-dir)

I executed the following command :

-bash-4.2$ sqoop import --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password  --table SettingAttribute -- --schema Administration  --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database default --hcatalog-table SettingAttribute --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

The source table contains 109 records and those are fetched :

16/08/10 15:02:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
16/08/10 15:02:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/10 15:02:28 INFO manager.SqlManager: Using default fetchSize of 1000
16/08/10 15:02:28 INFO manager.SQLServerManager: We will use schema Administration
16/08/10 15:02:28 INFO tool.CodeGenTool: Beginning code generation
16/08/10 15:02:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration].[SettingAttribute] AS t WHERE 1=0
16/08/10 15:02:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce
Note: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/10 15:02:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.jar
16/08/10 15:02:30 INFO mapreduce.ImportJobBase: Beginning import of SettingAttribute
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/10 15:02:31 INFO impl.TimelineClientImpl: Timeline service address: http://l4373t.sss.com:8188/ws/v1/timeline/
16/08/10 15:02:31 INFO client.RMProxy: Connecting to ResourceManager at l4283t.sss.com/138.106.9.80:8050
16/08/10 15:02:33 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/10 15:02:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([SettingAttributeId]), MAX([SettingAttributeId]) FROM [Administration].[SettingAttribute]
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: number of splits:4
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467787344827_0013
16/08/10 15:02:34 INFO impl.YarnClientImpl: Submitted application application_1467787344827_0013
16/08/10 15:02:34 INFO mapreduce.Job: The url to track the job: http://l4283t.sss.com:8088/proxy/application_1467787344827_0013/
16/08/10 15:02:34 INFO mapreduce.Job: Running job: job_1467787344827_0013
16/08/10 15:02:41 INFO mapreduce.Job: Job job_1467787344827_0013 running in uber mode : false
16/08/10 15:02:41 INFO mapreduce.Job:  map 0% reduce 0%
16/08/10 15:02:47 INFO mapreduce.Job:  map 100% reduce 0%
16/08/10 15:02:48 INFO mapreduce.Job: Job job_1467787344827_0013 completed successfully
16/08/10 15:02:48 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=616636
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=540
                HDFS: Number of bytes written=10079
                HDFS: Number of read operations=16
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=16132
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=16132
                Total vcore-seconds taken by all map tasks=16132
                Total megabyte-seconds taken by all map tasks=66076672
        Map-Reduce Framework
                Map input records=109
                Map output records=109
                Input split bytes=540
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=320
                CPU time spent (ms)=6340
                Physical memory (bytes) snapshot=999870464
                Virtual memory (bytes) snapshot=21872697344
                Total committed heap usage (bytes)=943194112
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=10079
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Transferred 9.8428 KB in 17.2115 seconds (585.597 bytes/sec)
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Retrieved 109 records.

The files are created under my user :

hdfs dfs -ls /user/ojoqcu/SettingAttribute
Found 5 items
-rw-------   3 ojoqcu hdfs          0 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/_SUCCESS
-rw-------   3 ojoqcu hdfs       8378 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00000
-rw-------   3 ojoqcu hdfs        144 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00001
-rw-------   3 ojoqcu hdfs       1123 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00002
-rw-------   3 ojoqcu hdfs        434 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00003

I cannot see anything in HCatalog(nor in Hive)

-bash-4.2$ /usr/hdp/2.4.2.0-258/hive-hcatalog/bin/hcat -e "show tables in default;"
WARNING: Use "yarn jar" to launch YARN applications.
16/08/10 15:07:12 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
OK
Time taken: 2.007 seconds

Is there some authorization issue ?

I checked the var/log but none exist for Sqoop, Hive-Hcatalog and Hive, how can I view the authorization issue and fix it ?

解决方案

Well, I'm unsure whether it was an authorization issue or a mere parsing problem or both. I did the following and it worked :

  1. Did an su hive
  2. Executed the following command(probably, the -- --schema should bethe last arg, Sqoop simply ignores/breaks after that!)

    sqoop import --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password --table DatabaseLog -- --schema ape

这篇关于将Sqoop导入HCatalog / Hive - 表不可见的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-19 09:37