只有一个工作人员才能在多个节点上运行作业

只有一个工作人员才能在多个节点上运行作业

本文介绍了Dask:只有一个工作人员才能在多个节点上运行作业,只能在一个节点上运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python函数处理某些文件,并希望使用dask在PBS集群上并行化任务.在群集上,我只能启动一项作业,但可以访问10个节点,每个节点有24个核心.

I am trying to process some files using a python function and would like to parallelize the task on a PBS cluster using dask. On the cluster I can only launch one job but have access to 10 nodes with 24 cores each.

所以我的PBSCluster看起来像这样:

So my dask PBSCluster looks like:

import dask
from dask_jobqueue import PBSCluster
cluster = PBSCluster(cores=240,
                     memory="1GB",
                     project='X',
                     queue='normal',
                     local_directory='$TMPDIR',
                     walltime='12:00:00',
                    resource_spec='select=10:ncpus=24:mem=1GB',
                    )
cluster.scale(1) # one worker
from dask.distributed import Client
client = Client(cluster)
client

在达斯克的集群显示1个具有240个内核的工作线程之后(不确定是否有意义).当我运行

After the Cluster in Dask shows 1 worker with 240 cores (not sure if that make sense).When I run

result = compute(*foo, scheduler='distributed')

并访问分配的节点,只有其中一个实际上正在运行计算.我不确定是否使用了正确的PBS配置.

and access the allocated nodes only one of them is actually running the computation. I am not sure if I using the right PBS configuration.

推荐答案

cluster = PBSCluster(cores=240,
                     memory="1GB",

您给Dask Jobqueue构造函数的值是单个节点的单个作业的值.因此,这里您要的是一个拥有240个内核的节点,而今天这可能已经没有意义了.

The values you give to the Dask Jobqueue constructors are the values for a single job for a single node. So here you are asking for a node with 240 cores, which probably doesn't make sense today.

如果您只能启动一项工作,那么dask-jobqueue的模型可能对您不起作用.我建议您使用 dask-mpi 作为替代.

If you can only launch one job then dask-jobqueue's model probably won't work for you. I recommnd looking at dask-mpi as an alternative.

这篇关于Dask:只有一个工作人员才能在多个节点上运行作业,只能在一个节点上运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 05:30