问题描述
我对dask并不陌生,我发现拥有一个模块很容易获得并行化,这真是太好了.我正在一个项目中,我能够在一台计算机上并行化一个循环,如您可以在此处看到.但是,我想转到dask.distributed
.我对上面的类进行了以下更改:
I am new to dask and I found so nice to have a module that makes it easy to get parallelization. I am working on a project where I was able to parallelize in a single machine a loop as you can see here . However, I would like to move over to dask.distributed
. I applied the following changes to the class above:
diff --git a/mlchem/fingerprints/gaussian.py b/mlchem/fingerprints/gaussian.py
index ce6a72b..89f8638 100644
--- a/mlchem/fingerprints/gaussian.py
+++ b/mlchem/fingerprints/gaussian.py
@@ -6,7 +6,7 @@ from sklearn.externals import joblib
from .cutoff import Cosine
from collections import OrderedDict
import dask
-import dask.multiprocessing
+from dask.distributed import Client
import time
@@ -141,13 +141,14 @@ class Gaussian(object):
for image in images.items():
computations.append(self.fingerprints_per_image(image))
+ client = Client()
if self.scaler is None:
- feature_space = dask.compute(*computations, scheduler='processes',
+ feature_space = dask.compute(*computations, scheduler='distributed',
num_workers=self.cores)
feature_space = OrderedDict(feature_space)
else:
stacked_features = dask.compute(*computations,
- scheduler='processes',
+ scheduler='distributed',
num_workers=self.cores)
stacked_features = numpy.array(stacked_features)
这样做会产生此错误:
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
我尝试了添加if __name__ == '__main__':
的不同方法,但均未成功.可以通过运行此示例来复制.如果有人可以帮助我解决此问题,我将不胜感激.我不知道如何更改代码以使其正常工作.
I have tried different ways of adding if __name__ == '__main__':
without any success. This can be reproduced by running this example. I would appreciate if anyone could help me to figure this out. I have no clue on how I should change my code to make it work.
谢谢.
示例为cu_training.py
.
推荐答案
Client
命令启动新进程,因此它必须位于if __name__ == '__main__':
块内,如此SO问题或此 GitHub问题
The Client
command starts up new processes, so it will have to be within the if __name__ == '__main__':
block as described in this SO question or this GitHub issue
这与多处理模块相同
这篇关于在当前进程完成其引导阶段之前,已尝试启动一个新进程.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!