1. 重命名pig job name:
在Pig脚本中的一开始处,写上这一句:
set job.name 'This is my job';
2. 设置pig参数:
允许pig时,输入如下:
pig -p JOBNAME="MyJob" test.pig
************test.pig**********
set job.name '$JOBNAME';
......
3. pig分隔符定义:
pig默认分隔符是/t,可以通过如下命令 using PigStorage(',')自定义分隔符:
prices = load 'NYSE_daily' using PigStorage(',') as (exchange, symbol, date, open,high, low, close, volume, adj_close);
4. pig定义reduce个数:
Parallel
设置pig的reduce进程个数
--parallel.pig
daily = load 'NYSE_daily' as (exchange, symbol, date, open, high, low, close,
volume, adj_close);
bysymbl = group daily by symbol parallel 10;
parallel只针对一条语句,如果希望脚本中的所有语句都有10个reduce进程,可以使用 set default_parallel 10命令
--defaultparallel.pig
set default_parallel 10;
daily = load 'NYSE_daily' as (exchange, symbol, date, open, high, low, close,
volume, adj_close);
bysymbl = group daily by symbol;
average = foreach bysymbl generate group, AVG(daily.close) as avg;
sorted = order average by avg desc;
其他可以参考:
http://www.cnblogs.com/siwei1988/archive/2012/08/06/2624912.html