问题描述
我正在链接多个 MapReduce 作业,并希望将一些元信息(例如原始输入的配置或名称)与结果一起传递/存储.至少文件_SUCCESS"以及目录_logs"中的任何内容都可以忽略.
I'm chaining multiple MapReduce jobs and want to pass along/store some meta information (e.g. configuration or name of original input) with the results. At least the file "_SUCCESS" and also anything in the directory "_logs" seams to be ignored.
是否有任何默认情况下被 InputReader
忽略的文件名模式?或者这只是一个固定的有限列表?
Are there any filename patterns which are by default ignored by the InputReader
? Or is this just a fixed limited list?
推荐答案
FileInputFormat
使用以下 hiddenFileFilter 默认:
private static final PathFilter hiddenFileFilter = new PathFilter(){
public boolean accept(Path p){
String name = p.getName();
return !name.startsWith("_") && !name.startsWith(".");
}
};
因此,如果您使用任何FileInputFormat
(例如TextInputFormat
、KeyValueTextInputFormat
、SequenceFileInputFormat
),隐藏文件(文件名以_"或."开头)将被忽略.
So if you uses any FileInputFormat
(such as TextInputFormat
, KeyValueTextInputFormat
, SequenceFileInputFormat
), the hidden files (the file name starts with "_" or ".") will be ignored.
您可以使用 FileInputFormat.setInputPathFilter 设置您的自定义 PathFilter
.请记住,hiddenFileFilter
始终处于活动状态.
You can use FileInputFormat.setInputPathFilter to set your custom PathFilter
. Remember that the hiddenFileFilter
is always active.
这篇关于哪些文件被映射器忽略为输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!