本文介绍了带有Apache Pig的数据透视表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想知道是否有可能在Apache Pig中一次通过一个数据透视表.
I wonder if it's possible to pivot a table in one pass in Apache Pig.
输入:
Id Column1 Column2 Column3
1 Row11 Row12 Row13
2 Row21 Row22 Row23
输出:
Id Name Value
1 Column1 Row11
1 Column2 Row12
1 Column3 Row13
2 Column1 Row21
2 Column2 Row22
2 Column3 Row23
真实数据有数十列.
我可以使用awk一次性完成该操作,然后使用Hadoop Streaming运行它.但是我的大部分代码是Apache Pig,所以我想知道是否有可能在Pig中有效地做到这一点.
I can do that with awk in one pass then run it with Hadoop Streaming. But majority of my code is is Apache Pig so I wonder if it's possible to do it in Pig efficiently.
推荐答案
您可以通过以下两种方式进行操作:1.编写一个UDF,它返回一袋元组.这将是最灵活的解决方案,但需要Java代码.2.编写一个严格的脚本,如下所示:
You can do it in 2 ways:1. Write a UDF which returns a bag of tuples. It will be the most flexible solution, but requires Java code;2. Write a rigid script like this:
inpt = load '/pig_fun/input/pivot.txt' as (Id, Column1, Column2, Column3);
bagged = foreach inpt generate Id, TOBAG(TOTUPLE('Column1', Column1), TOTUPLE('Column2', Column2), TOTUPLE('Column3', Column3)) as toPivot;
pivoted_1 = foreach bagged generate Id, FLATTEN(toPivot) as t_value;
pivoted = foreach pivoted_1 generate Id, FLATTEN(t_value);
dump pivoted;
运行此脚本使我得到以下结果:
Running this script got me following results:
(1,Column1,11)
(1,Column2,12)
(1,Column3,13)
(2,Column1,21)
(2,Column2,22)
(2,Column3,23)
(3,Column1,31)
(3,Column2,32)
(3,Column3,33)
这篇关于带有Apache Pig的数据透视表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!