为了掌握Hive,我将人口普查数据(“来自美国不同国家工作的人的收入数据”)上传到S3存储桶中。

能够运行其他查询,但无法在简单查询之后运行。

我正在尝试列出收入水平超过5万美元的不同国家的人。

我在 hive 中创建了表格,并从AWS S3存储桶中导入了数据,此处的收入列定义为字符串,该列的可能值为“ 50K”

以下查询结果为空结果集。这可能是什么问题?该SQL语句可以在普通的MySQL控制台上正常运行。 为什么在HIVE中未显示预期结果集?

hive> select country, income from census_income_data where income = '>50K';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312281227_0011, Tracking URL = http://ip-172-31-44-80.us-west-2.compute.internal:9100/jobdetails.jsp?jobid=job_201312281227_0011
Kill Command = /home/hadoop/bin/hadoop job  -kill job_201312281227_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-12-28 13:21:05,086 Stage-1 map = 0%,  reduce = 0%
2013-12-28 13:21:26,279 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:27,289 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:28,299 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:29,310 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:30,321 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:31,334 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:32,369 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.74 sec
MapReduce Total cumulative CPU time: 7 seconds 740 msec
Ended Job = job_201312281227_0011
Counters:
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 7.74 sec   HDFS Read: 219 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 740 msec
OK
Time taken: 56.559 seconds

以下是上面代码中使用的数据集中的示例数据
30, State-gov, 141297, Bachelors, 13, Married-civ-spouse, Prof-specialty, Husband, Asian-Pac-Islander, Male, 0, 0, 40, India, >50K
23, Private, 122272, Bachelors, 13, Never-married, Adm-clerical, Own-child, White, Female, 0, 0, 30, United-States, <=50K
32, Private, 205019, Assoc-acdm, 12, Never-married, Sales, Not-in-family, Black, Male, 0, 0, 50, United-States, <=50K
40, Private, 121772, Assoc-voc, 11, Married-civ-spouse, Craft-repair, Husband, Asian-Pac-Islander, Male, 0, 0, 40, ?, >50K
34, Private, 245487, 7th-8th, 4, Married-civ-spouse, Transport-moving, Husband, Amer-Indian-Eskimo, Male, 0, 0, 45, Mexico, <=50K
25, Self-emp-not-inc, 176756, HS-grad, 9, Never-married, Farming-fishing, Own-child, White, Male, 0, 0, 35, United-States, <=50K
32, Private, 186824, HS-grad, 9, Never-married, Machine-op-inspct, Unmarried, White, Male, 0, 0, 40, United-States, <=50K
38, Private, 28887, 11th, 7, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 50, United-States, <=50K
43, Self-emp-not-inc, 292175, Masters, 14, Divorced, Exec-managerial, Unmarried, White, Female, 0, 0, 45, United-States, >50K
40, Private, 193524, Doctorate, 16, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 60, United-States, >50K
54, Private, 302146, HS-grad, 9, Separated, Other-service, Unmarried, Black, Female, 0, 0, 20, United-States, <=50K
35, Federal-gov, 76845, 9th, 5, Married-civ-spouse, Farming-fishing, Husband, Black, Male, 0, 0, 40, United-States, <=50K
43, Private, 117037, 11th, 7, Married-civ-spouse, Transport-moving, Husband, White, Male, 0, 2042, 40, United-States, <=50K
59, Private, 109015, HS-grad, 9, Divorced, Tech-support, Unmarried, White, Female, 0, 0, 40, United-States, <=50K
56, Local-gov, 216851, Bachelors, 13, Married-civ-spouse, Tech-support, Husband, White, Male, 0, 0, 40, United-States, >50K
19, Private, 168294, HS-grad, 9, Never-married, Craft-repair, Own-child, White, Male, 0, 0, 40, United-States, <=50K
54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K
39, Private, 367260, HS-grad, 9, Divorced, Exec-managerial, Not-in-family, White, Male, 0, 0, 80, United-States, <=50K
49, Private, 193366, HS-grad, 9, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 0, 40, United-States, <=50K
23, Local-gov, 190709, Assoc-acdm, 12, Never-married, Protective-serv, Not-in-family, White, Male, 0, 0, 52, United-States, <=50K

最佳答案

首先在表上运行select * from table limit 20,以验证期望列中确实存在期望值。
现在,可能还有其他字符(如空格)可能导致查询返回0个结果。
尝试以下方法:select country, income from census_income_data where income like '%50%';如果不起作用,则可能是您在创建表时放错了数据。
如果可行,请尝试:select country, income from census_income_data where income like '%>50K%';如果可行,您可能在该字段中还有其他字符,请尝试运行:select concat('INCOME:',income,'.') from census_income_data where income like '%>50K%';并查看您是否完全获得了该字符串INCOME:>50K.

关于hadoop - 在条件未显示Hive的预期输出的情况下很简单,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/20815153/

10-09 21:52