问题描述
我正在尝试使用 Elephantbird json 加载器解析以下输入(此输入中有 2 条记录)
[{"node_disk_lnum_1":36,"node_disk_xfers_in_rate_sum":136.40000000000001,"node_disk_bytes_in_rate_22":187392.0,node_disk_lnum_7":13}]
[{"node_disk_lnum_1": 36, "node_disk_xfers_in_rate_sum":105.2,node_disk_bytes_in_rate_22":123084.8,node_disk_lnum_7":13}]
这是我的语法:
注册'/home/data/Desktop/elephant-bird-pig-4.1.jar';a = LOAD '/pig/tc1.log' 使用com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);b = FOREACH a GENERATE flatten(json#'node_disk_lnum_1') ASnode_disk_lnum_1,flatten(json#'node_disk_xfers_in_rate_sum') ASnode_disk_xfers_in_rate_sum,flatten(json#'node_disk_bytes_in_rate_22') ASnode_disk_bytes_in_rate_22, flatten(json#'node_disk_lnum_7') ASnode_disk_lnum_7;描述 b;
b 描述结果:
b: {node_disk_lnum_1: bytearray,node_disk_xfers_in_rate_sum:bytearray,node_disk_bytes_in_rate_22: bytearray,node_disk_lnum_7:字节数组}
c = FOREACH b GENERATE node_disk_lnum_1;描述 c;
c: {node_disk_lnum_1: bytearray}
DUMP c;
预期结果:
36, 136.40000000000001, 187392.0, 13
36、105.2、123084.8、13
抛出以下错误
2017-02-06 01:05:49,337 [主要] 信息org.apache.pig.tools.pigstats.ScriptState - 在脚本:未知 2017-02-06 01:05:49,386 [main] INFOorg.apache.pig.data.SchemaTupleBackend - 密钥 [pig.schematuple] 不是设置...不会生成代码.2017-02-06 01:05:49,387 [主要] 信息org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -{RULES_ENABLED=[AddForEach,ColumnMapKeyPrune,ConstantCalculator,GroupByConstParallelSetter、LimitOptimizer、LoadTypeCastInserter、MergeFilter、MergeForEach、PartitionFilterOptimizer、PredicatePushdownOptimizer、PushDownForEachFlatten、PushUpFilter、SplitFilter, StreamTypeCastInserter]} 2017-02-06 01:05:49,390 [主要]信息 org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - 地图所需的密钥:$0->[node_disk_lnum_1,node_disk_xfers_in_rate_sum, node_disk_bytes_in_rate_22,node_disk_lnum_7]
2017-02-06 01:05:49,395 [主要] 信息org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler- 文件连接阈值:100 乐观?假 2017-02-06 01:05:49,398 [主要] 信息org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- 优化前的 MR 计划大小:1 2017-02-06 01:05:49,398 [main] INFOorg.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer- 优化后的 MR 计划大小:1 2017-02-06 01:05:49,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig脚本设置添加到作业 2017-02-06 01:05:49,426 [main]信息org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler- mapred.job.reduce.markreset.buffer.percent 未设置,设置为默认 0.3 2017-02-06 01:05:49,428 [main] ERRORorg.apache.pig.tools.grunt.Grunt - 错误 2998:内部未处理错误.com/twitter/elephantbird/util/HadoopCompat
请帮助我错过了什么?
您的 json 中没有任何嵌套数据,因此请删除 -nestedload
a = LOAD '/pig/tc1.log' 使用 com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);
I'm trying to parse below input (there are 2 records in this input)using Elephantbird json loader
Here is my syntax:
register '/home/data/Desktop/elephant-bird-pig-4.1.jar';
a = LOAD '/pig/tc1.log' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);
b = FOREACH a GENERATE flatten(json#'node_disk_lnum_1') AS
node_disk_lnum_1,flatten(json#'node_disk_xfers_in_rate_sum') AS
node_disk_xfers_in_rate_sum,flatten(json#'node_disk_bytes_in_rate_22') AS
node_disk_bytes_in_rate_22, flatten(json#'node_disk_lnum_7') AS
node_disk_lnum_7;
DESCRIBE b;
b describe result:
c = FOREACH b GENERATE node_disk_lnum_1;
DESCRIBE c;
DUMP c;
Expected Result:
Throwing the below error
Please help what am I missing?
You do not have any nested data in your json,so remove -nestedload
a = LOAD '/pig/tc1.log' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);
这篇关于Apache PIG、ELEPHANTBIRDJSON 加载器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!