本文介绍了如何将.txt文件转换为Hadoop的序列文件格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
为了有效地利用中的map-reduce作业,我需要将数据存储在。然而,目前的数据只是平坦的.txt格式。任何人都可以建议我可以将.txt文件转换为序列文件?
To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop's sequence file format. However,currently the data is only in flat .txt format.Can anyone suggest a way i can convert a .txt file to a sequence file?
推荐答案
所以方式更简单的答案只是一个具有SequenceFile输出的身份作业。
在java中看起来像这样:
So the way more simplest answer is just an "identity" job that has a SequenceFile output.
Looks like this in java:
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJobName("Convert Text");
job.setJarByClass(Mapper.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
// increase if you need sorting or a special number of files
job.setNumReduceTasks(0);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job, new Path("/lol"));
SequenceFileOutputFormat.setOutputPath(job, new Path("/lolz"));
// submit and wait for completion
job.waitForCompletion(true);
}
这篇关于如何将.txt文件转换为Hadoop的序列文件格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!