本文介绍了清理Dask工人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在多节点分布式Dask集群上运行多个并行任务.但是,一旦任务完成,工作人员仍会保留大量内存,并且群集很快就会被填满.

I am running multiple parallel tasks on a multi-node distributed Dask cluster. However, once the tasks are finished, workers still hold large memory and cluster gets filled up soon.

在每个任务和 client.cancel(df)之后,我都尝试过 client.restart(),第一个杀死工人并发送 CancelledError 对于其他麻烦的正在运行的任务,第二个任务并没有多大帮助,因为我们在dask的 map 函数中使用了许多自定义对象和函数.为已知变量添加 del gc.collect()也无济于事.

I have tried client.restart() after every task and client.cancel(df), the first one kills workers and sends CancelledError to other running tasks which is troublesome and second one did not help much because we use a lot of custom objects and functions inside dask's map functions. Adding del for known variables and gc.collect() also doesn't help much.

我确定保留的大部分内存是由于自定义的python函数和使用 client.map(..)调用的对象.

I am sure most of the memory held up is because of custom python functions and objects called with client.map(..).

我的问题是:

  1. 是否有从命令行或其他方式使用的方法,例如如果当前没有任何任务在运行,则触发工作程序重新启动.
  2. 如果没有,该问题的可能解决方法是什么?对我来说,避免在Dask任务中使用自定义对象和纯python函数是不可能的.

推荐答案

如果没有对期货的引用,那么Dask应该删除对您使用它创建的Python对象的所有引用.有关更多信息,请参见 https://www.youtube.com/watch?v=MsnzpzFZAoQ 有关如何进行调查的信息.

If there are no references to futures then Dask should delete any referneces to Python objects that you've created with it. See https://www.youtube.com/watch?v=MsnzpzFZAoQ for more information on how to investigate this.

如果您的自定义Python代码确实有其自身的内存泄漏,则可以,您可以要求Dask工作者定期重新启动自己.请参见 dask-worker --help 手册页,并查找以-lifetime

If your custom Python code does have some memory leak of its own then yes, you can Ask Dask workers to periodically restart themselves. See the dask-worker --help man page and look for keywords that start with --lifetime

这篇关于清理Dask工人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 05:34