在Hadoop的分布式缓存的生命

在Hadoop的分布式缓存的生命

本文介绍了在Hadoop的分布式缓存的生命的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在文件传输到使用Hadoop的数据流作业,分布式缓存机制,并在系统中删除这些文件的节点后,作业完成?如果它们被删除,这是我presume他们,有没有一种方法,使缓存依然为多个作业?这是否以同样的方式在亚马逊弹性麻preduce?

When files are transferred to nodes using the distributed cache mechanism in a Hadoop streaming job, does the system delete these files after a job is completed? If they are deleted, which i presume they are, is there a way to make the cache remain for multiple jobs? Does this work the same way on Amazon's Elastic Mapreduce?

推荐答案

我周围挖源$ C ​​$ C,它看起来像文件由 TrackerDistributedCacheManager 约一次,当他们的引用计数下降到零一分钟。该 TaskRunner 明确地释放它的所有文件在任务结束。也许你应该修改 TaskRunner 不这样做,并控制缓存通过更明确的表示自己呢?

I was digging around in the source code, and it looks like files are deleted by TrackerDistributedCacheManager about once a minute when their reference count drops to zero. The TaskRunner explicitly releases all its files at the end of a task. Maybe you should edit TaskRunner to not do this, and control the cache through more explicit means yourself?

这篇关于在Hadoop的分布式缓存的生命的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-07 05:45