在Hadoop的分布式缓存的生命 | 在Hadoop的分布式缓存的生命

本文介绍了在Hadoop的分布式缓存的生命的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在文件传输到使用Hadoop的数据流作业，分布式缓存机制，并在系统中删除这些文件的节点后，作业完成？如果它们被删除，这是我presume他们，有没有一种方法，使缓存依然为多个作业？这是否以同样的方式在亚马逊弹性麻preduce？

When files are transferred to nodes using the distributed cache mechanism in a Hadoop streaming job, does the system delete these files after a job is completed? If they are deleted, which i presume they are, is there a way to make the cache remain for multiple jobs? Does this work the same way on Amazon's Elastic Mapreduce?

推荐答案

我周围挖源$ C $ C，它看起来像文件由 TrackerDistributedCacheManager 约一次，当他们的引用计数下降到零一分钟。该 TaskRunner 明确地释放它的所有文件在任务结束。也许你应该修改 TaskRunner 不这样做，并控制缓存通过更明确的表示自己呢？

I was digging around in the source code, and it looks like files are deleted by TrackerDistributedCacheManager about once a minute when their reference count drops to zero. The TaskRunner explicitly releases all its files at the end of a task. Maybe you should edit TaskRunner to not do this, and control the cache through more explicit means yourself?

                        这篇关于在Hadoop的分布式缓存的生命的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！