问题描述
我正在尝试使用多个线程(和 tensorflow
后端)训练多个具有不同参数值的 keras
模型.我已经看到了一些在多个线程中使用相同模型的示例,但在这种特殊情况下,我遇到了有关冲突图等的各种错误.这是我想要做的一个简单示例:
I'm attempting to train multiple keras
models with different parameter values using multiple threads (and the tensorflow
backend). I've seen a few examples of using the same model within multiple threads, but in this particular case, I run into various errors regarding conflicting graphs, etc. Here's a simple example of what I'd like to be able to do:
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.layers import Dense
from keras.models import Sequential
sess = tf.Session()
def example_model(size):
model = Sequential()
model.add(Dense(size, input_shape=(5,)))
model.add(Dense(1))
model.compile(optimizer='sgd', loss='mse')
return model
if __name__ == '__main__':
K.set_session(sess)
X = np.random.random((10, 5))
y = np.random.random((10, 1))
models = [example_model(i) for i in range(5, 10)]
e = ThreadPoolExecutor(4)
res_list = [e.submit(model.fit, X, y) for model in models]
for res in res_list:
print(res.result())
产生的错误是 ValueError: Tensor("Variable:0", shape=(5, 5), dtype=float32_ref) must be from the same graph as Tensor("Variable_2/read:0", shape=(), dtype=float32).
.我还尝试在线程中初始化模型,这会导致类似的失败.
The resulting error is ValueError: Tensor("Variable:0", shape=(5, 5), dtype=float32_ref) must be from the same graph as Tensor("Variable_2/read:0", shape=(), dtype=float32).
. I've also tried initializing the models within the threads which gives a similar failure.
对解决此问题的最佳方法有什么想法吗?我完全不喜欢这种确切的结构,但我更愿意使用多线程而不是进程,这样所有模型都在相同的 GPU 内存分配中进行训练.
Any thoughts on the best way to go about this? I'm not at all attached to this exact structure, but I'd prefer to be able to use multiple threads rather than processes so all the models are trained within the same GPU memory allocation.
推荐答案
Tensorflow Graphs 不是线程安全的(参见 https://www.tensorflow.org/api_docs/python/tf/Graph) 并且当您创建新的 Tensorflow 会话时,它默认使用默认图形.
Tensorflow Graphs are not threadsafe (see https://www.tensorflow.org/api_docs/python/tf/Graph) and when you create a new Tensorflow Session, it by default uses the default graph.
您可以通过在并行化函数中使用新图创建新会话并在那里构建 keras 模型来解决此问题.
You can get around this by creating a new session with a new graph in your parallelized function and constructing your keras model there.
以下是一些代码,可在每个可用的 GPU 上并行创建和拟合模型:
Here is some code that creates and fits a model on each available gpu in parallel:
import concurrent.futures
import numpy as np
import keras.backend as K
from keras.layers import Dense
from keras.models import Sequential
import tensorflow as tf
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
xdata = np.random.randn(100, 8)
ytrue = np.random.randint(0, 2, 100)
def fit(gpu):
with tf.Session(graph=tf.Graph()) as sess:
K.set_session(sess)
with tf.device(gpu):
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(xdata, ytrue, verbose=0)
return model.evaluate(xdata, ytrue, verbose=0)
gpus = get_available_gpus()
with concurrent.futures.ThreadPoolExecutor(len(gpus)) as executor:
results = [x for x in executor.map(fit, gpus)]
print('results: ', results)
这篇关于TensorFlow/Keras 多线程模型拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!