问题描述
当前,我能够在mapper中实现从part-00000
到自定义fileName的名称更改.我这样做是通过inputSplit
进行的.我在reducer中尝试了同样的操作来重命名文件,但是fileSplit方法不适用于减速器.因此,有没有一种最好的方法来将reducer的输出重命名为inputfile名称.下面是我在mapper中实现它的方式.
Currently I am able to implement the name change from part-00000
to a custom fileName in mapper. I am doing this by taking the inputSplit
. I tried the same in reducer to rename the file but, fileSplit method is not available for reducer. So, is there a best way to rename the output of a reducer to with inputfile name. Below is how I acheived it in mapper.
@Override
public void setup(Context con) throws IOException, InterruptedException {
fileName = ((FileSplit) con.getInputSplit()).getPath().getName();
fileName = fileName.substring(0,36);
outputName = new Text(fileName);
final Path baseOutputPath = FileOutputFormat.getOutputPath(con);
final Path outputFilePath = new Path(baseOutputPath, fileName);
TextOutputFormat<IntWritable, Text> write = new TextOutputFormat<IntWritable, Text>() {
@Override
public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException {
return outputFilePath;
推荐答案
这就是hadoop 维基说:
This is what hadoop wiki says:
You can subclass the OutputFormat.java class and write your own. You can locate and browse the code of TextOutputFormat, MultipleOutputFormat.java, etc. for reference. It might be the case that you only need to do minor changes to any of the existing Output Format classes. To do that you can just subclass that class and override the methods you need to change.
如果需要使用键和输入文件格式,则可以创建 MultipleOutputFormat 来控制输出文件名.
If you need to be on key and input file format, then you could create subclass of MultipleOutputFormat to control output file name.
这篇关于如何将输出文件名从reducer中的part-00000更改为输入文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!