问题描述
我想编写一个可以处理文本和zip文件的MapReduce应用程序。为此我想使用不同的输入格式,一个用于文本,另一个用于zip。是否有可能这样做?
I want to write a MapReduce application which can process both text and zip files. For this I want to use to different input formats, one for text and another for zip. Is it possible to do so?
推荐答案
从@ ChrisWhite的答案延伸一点,你需要的是使用一个自定义<$使用ZIP文件的c $ c> InputFormat 和 RecordReader
。你可以在这里找到并在这里。
Extending a bit from @ChrisWhite's answer, what you need is to use a custom InputFormat
and RecordReader
that work with ZIP files. You can find here a sample ZipFileInputFormat and here a sample ZipFileRecordReader.
鉴于此,Chris建议您应该使用 MultipleInputs
,以下是我如何如果你不需要每种文件类型的自定义映射器,那就去做:
Given this, as Chris suggested you should use MultipleInputs
, and here is how I would do it if you don't need custom mappers for each type of file:
MultipleInputs.addInputPath(job, new Path("/path/to/zip"), ZipFileInputFormat.class);
MultipleInputs.addInputPath(job, new Path("/path/to/txt"), TextInputFormat.class);
这篇关于在配置MapReduce作业时使用多个InputFormat类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!