我想将一些Pig变量存储到Hadoop SequenceFile中,以便运行外部MapReduce作业。
假设我的数据具有(chararray,int)模式:
(hello,1)
(test,2)
(example,3)
我写了这个存储函数:
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.OutputFormat;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.pig.StoreFunc;
import org.apache.pig.data.Tuple;
public class StoreTest extends StoreFunc {
private String storeLocation;
private RecordWriter writer;
private Job job;
public StoreTest(){
}
@Override
public OutputFormat getOutputFormat() throws IOException {
//return new TextOutputFormat();
return new SequenceFileOutputFormat();
}
@Override
public void setStoreLocation(String location, Job job) throws IOException {
this.storeLocation = location;
this.job = job;
System.out.println("Load location is " + storeLocation);
FileOutputFormat.setOutputPath(job, new Path(location));
System.out.println("Out path " + FileOutputFormat.getOutputPath(job));
}
@Override
public void prepareToWrite(RecordWriter writer) throws IOException {
this.writer = writer;
}
@Override
public void putNext(Tuple tuple) throws IOException {
try {
Text k = new Text(((String)tuple.get(0)));
IntWritable v = new IntWritable((Integer)tuple.get(1));
writer.write(k, v);
} catch (InterruptedException ex) {
Logger.getLogger(StoreTest.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
这个Pig代码:
register MyUDFs.jar;
x = load '/user/pinoli/input' as (a:chararray,b:int);
store x into '/user/pinoli/output/' using StoreTest();
但是,存储失败,并且出现此错误:
ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable
有什么办法可以解决?
最佳答案
问题是您没有设置输出键/值类。您可以在setStoreLocation()
方法中执行此操作:
@Override
public void setStoreLocation(String location, Job job) throws IOException {
this.storeLocation = location;
this.job = job;
this.job.setOutputKeyClass(Text.class); // !!!
this.job.setOutputValueClass(IntWritable.class); // !!!
...
}
我猜您想将您的存储库用于不同的键/值类型。在这种情况下,您可以将其类型传递给构造函数。
例如:
private Class<? extends WritableComparable> keyClass;
private Class<? extends Writable> valueClass;
...
public StoreTest() {...}
@SuppressWarnings({ "unchecked", "rawtypes" })
public StoreTest(String keyClass, String valueClass) {
try {
this.keyClass = (Class<? extends WritableComparable>) Class.forName(keyClass);
this.valueClass = (Class<? extends Writable>) Class.forName(valueClass);
}
catch (Exception e) {
throw new RuntimeException("Invalid key/value type", e);
}
}
...
@Override
public void setStoreLocation(String location, Job job) throws IOException {
this.storeLocation = location;
this.job = job;
this.job.setOutputKeyClass(keyClass);
this.job.setOutputValueClass(valueClass);
...
}
然后,在Pig脚本中设置正确的类型:
register MyUDFs.jar;
DEFINE myStorer StoreTest('org.apache.hadoop.io.Text', 'org.apache.hadoop.io.IntWritable');
x = load '/user/pinoli/input' as (a:chararray,b:int);
store x into '/user/pinoli/output/' using myStorer();
关于hadoop - 用Pig无法写入SequenceFile,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26611113/