问题描述
我做了以下架构
Layer (type) Output Shape Param #
=================================================================
embedding_7 (Embedding) (None, 50, 64) 512000
_________________________________________________________________
bidirectional_5 (Bidirection (None, 200) 132000
_________________________________________________________________
dense_9 (Dense) (None, 1) 201
=================================================================
Total params: 644,201
Trainable params: 644,201
Non-trainable params: 0
使用以下代码:
with tpu_strategy.scope():
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(Bidirectional(LSTM(HIDDEN_DIM)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy',f1_m])
print(model.summary())
history = model.fit(X_train, y_train, epochs=EPOCHS,validation_data=(X_val, y_val),
callbacks=[EarlyStopping(monitor='val_f1_m', patience=5, min_delta=0.001, mode = 'max')],
class_weight=class_weight)
我可以训练模型并正确调用方法model.evaluate(X_test,y_test).但是,当我调用model.predict(X_test)时,当X_test的形状为(24255,50)时,结果数组的形状为(24256,1).为什么会这样?为什么我会得到一个额外的预测?所得的预测数组不应该是(24255,1)吗?
I can train the model and call the method model.evaluate(X_test,y_test) with no errors. But, when I call model.predict(X_test), the resulting array has the shape (24256, 1) when X_test has the shape (24255, 50). Why does this happen? Why am I getting one extra prediction? Shouldn't the resulting array of predictions be (24255, 1)?
我正在为此使用Google Colab.我做了这个小代码来复制问题
I was using Google Colab for this one. I made this small code to replicate the problem
import numpy as np
import tensorflow as tf
#Random numbers
X_fake = np.array([[1]*50]*6+[[0]*50]*6)
y_fake = np.array([1]*6+[0]*6)
def create_tpu_strategy():
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
return tpu_strategy
tpu_strategy = create_tpu_strategy()
with tpu_strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Embedding(8000, 64, input_length=X_fake.shape[1]),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),optimizer=tf.keras.optimizers.Adam(1e-4),metrics=['accuracy'])
print(model.summary())
model.fit(X_fake, y_fake, epochs=1)
preds = model.predict_classes(X_fake)
print(preds.shape,X_fake.shape)
这是形状的输出:
(16, 1) (12, 50)
当我停止使用TPU时,输出就是我从一开始就期望的结果:
When I stopped using the TPU, the output was what I expected from the beginning:
(12, 1) (12, 50)
现在,我没有将TPU用于原始代码,并且可以正常工作.但是,为什么会这样呢?我的tpu策略初始化有误吗?
Now I'm not using TPU for my original code and it works fine. But, still, why does this happen? Am I initializing wrong my tpu strategy?
推荐答案
我相信 model.predict
和 model.predict_classes
的期望输入大小是TPU核心数(在这种情况下为8).尝试将输入大小设置为8的倍数,并且应该可以正常工作.
I believe for model.predict
and model.predict_classes
expect your input size be a multiple of the number of TPU cores (8 in this case). Try to make your input size to be a multiple of 8 and it should work as expected.
- 对于较小的输入大小,您可以直接调用
preds = model(X_fake)
. - 对于大输入,您可以确保它是8的倍数.
tf-nightly 已解决此问题.如果您尝试每晚安装Tensorflow并将TPU切换为tf-nightly,那么它将起作用:
This issue is already resolved in tf-nightly.If you try installing Tensorflow Nightly and switching TPU to tf-nightly then it will work:
!pip install cloud-tpu-client
!pip install tf-nightly
import tensorflow as tf
from cloud_tpu_client import Client
import numpy as np
# Change TPU to match Colab Tenserflow version
c = Client()
c.configure_tpu_version(tf.__version__, restart_type='ifNeeded')
#Random numbers
X_fake = np.array([[1]*50]*6+[[0]*50]*6)
y_fake = np.array([1]*6+[0]*6)
def create_tpu_strategy():
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
return tpu_strategy
tpu_strategy = create_tpu_strategy()
with tpu_strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Embedding(8000, 64, input_length=X_fake.shape[1]),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),optimizer=tf.keras.optimizers.Adam(1e-4),metrics=['accuracy'])
print(model.summary())
model.fit(X_fake, y_fake, epochs=1)
preds = model.predict_classes(X_fake)
print(preds.shape, X_fake.shape)
则输出形状为(12,1)(12,50)
.
这篇关于使用Google Colab的tpu策略时Keras预测方法的输出格式错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!