eclipse - 在hadoop集群上运行时，不会调用configure()，但可以在Eclipse上调用DistributedCache FIleNotFoundException

我的程序使用DistributedCache来缓存文件

JobConf conf = new JobConf(new Configuration(), ItemMining.class);
DistributedCache.addCacheFile(new URI("output1/FList.txt"), conf);
DistributedCache.addCacheFile(new URI("output1/GList.txt"), conf);

我得到文件

configure(){

..
localFiles = DistributedCache.getLocalCacheFiles(job);
FileSystem fs = FileSystem.get(job);
FSDataInputStream inF = fs.open(localFiles[0]);
..

}

整个程序可以在Eclipse上运行并获得正确的结果。但是，当我在Hadoop集群中运行它时，我发现这部分没有被调用!
为什么会这样？
我是否需要在配置中进行设置？

最佳答案

问题解决了，原来我犯了两个错误:

1)我在configure()的开头添加了System.out.println()，但未显示
事实证明mapreduce不能在mapreduce阶段使用System.out.println()，如果我们想查看它，我们需要检查我们的日志，以了解更多信息。

2)我真正的错误与DistributedCache有关，我添加了一个文件并想将其读取到内存中，以打开路径，我们需要FileSystem.getLocal()如下:

    localFiles = DistributedCache.getLocalCacheFiles(job);
    FileSystem fs = FileSystem.getLocal(job);
    FSDataInputStream inF = fs.open(localFiles[0]);

感谢Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)

关于eclipse - 在hadoop集群上运行时，不会调用configure()，但可以在Eclipse上调用DistributedCache FIleNotFoundException，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/16008518/