本文介绍了收敛于 keras-tf 但不收敛于 keras 的精确模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在预测 EWMA(指数加权移动平均)公式使用简单 RNN 的时间序列.已经在此处发布了相关信息.

I am working on predicting the EWMA (exponential weighted moving average) formula on a time series using a simple RNN. Already posted about it here.

虽然模型使用 keras-tf(来自 tensorflow import keras)可以很好地收敛,但完全相同的代码在使用原生 keras(导入 keras)时不起作用.

While the model converges beautifully using keras-tf (from tensorflow import keras), the exact same code doesn't work using native keras (import keras).

收敛模型代码(keras-tf):

Converging model code (keras-tf):

from tensorflow import keras
import numpy as np

np.random.seed(1337)  # for reproducibility

def run_avg(signal, alpha=0.2):
    avg_signal = []
    avg = np.mean(signal)
    for i, sample in enumerate(signal):
        if np.isnan(sample) or sample == 0:
            sample = avg
        avg = (1 - alpha) * avg + alpha * sample
        avg_signal.append(avg)
    return np.array(avg_signal)

def train():
    x = np.random.rand(3000)
    y = run_avg(x)
    x = np.reshape(x, (-1, 1, 1))
    y = np.reshape(y, (-1, 1))

    input_layer = keras.layers.Input(batch_shape=(1, 1, 1), dtype='float32')
    rnn_layer = keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1')(input_layer)
    model = keras.Model(inputs=input_layer, outputs=rnn_layer)

    model.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse')
    model.summary()

    print(model.get_layer('rnn_layer_1').get_weights())
    model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
    print(model.get_layer('rnn_layer_1').get_weights())

train()

非收敛模型代码:

from keras import Model
from keras.layers import SimpleRNN, Input
from keras.optimizers import SGD

import numpy as np

np.random.seed(1337)  # for reproducibility

def run_avg(signal, alpha=0.2):
    avg_signal = []
    avg = np.mean(signal)
    for i, sample in enumerate(signal):
        if np.isnan(sample) or sample == 0:
            sample = avg
        avg = (1 - alpha) * avg + alpha * sample
        avg_signal.append(avg)
    return np.array(avg_signal)

def train():
    x = np.random.rand(3000)
    y = run_avg(x)
    x = np.reshape(x, (-1, 1, 1))
    y = np.reshape(y, (-1, 1))

    input_layer = Input(batch_shape=(1, 1, 1), dtype='float32')
    rnn_layer = SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1')(input_layer)
    model = Model(inputs=input_layer, outputs=rnn_layer)


    model.compile(optimizer=SGD(lr=0.1), loss='mse')
    model.summary()

    print(model.get_layer('rnn_layer_1').get_weights())
    model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
    print(model.get_layer('rnn_layer_1').get_weights())

train()

虽然在 tf-keras 收敛模型中,损失最小化并且权重很好地近似于 EWMA 公式,但在非收敛模型中,损失爆炸到 nan.据我所知,唯一的区别是我导入类的方式.

While in the tf-keras converging model, the loss minimizes and weights approximate nicely the EWMA formula, in the non-converging model, the loss explodes to nan. The only difference as far as I can tell is the way I import the classes.

我为两种实现使用了相同的随机种子.我正在使用 keras 2.2.4 和 tensorflow 1.13.1 版(包括 2.2.4-tf 版中的 keras)在 Windows pc、Anaconda 环境中工作.

I used the same random seed for both implementations. I am working on a Windows pc, Anaconda environment with keras 2.2.4 and tensorflow version 1.13.1 (which includes keras in version 2.2.4-tf).

对此有何见解?

推荐答案

这可能是因为 TF Keras本地 Keras.

This might be because of difference (1 liner) in implementation of SimpleRNN, between TF Keras and Native Keras.

下面提到的 Line 是在 TF Keras 中实现的,在 Keras 中没有实现.

The Line mentioned below is implemented in TF Keras and is not implemented in Keras.

self.input_spec = [InputSpec(ndim=3)]

这种差异的一个例子是你上面提到的.

One case of this difference is that mentioned by you above.

我想演示类似的情况,使用 Keras 的 Sequential 类.

I want to demonstrate similar case, using Sequential class of Keras.

以下代码适用于 TF Keras:

Below code works fine for TF Keras:

from tensorflow import keras
import numpy as np
from tensorflow.keras.models import Sequential as Sequential

np.random.seed(1337)  # for reproducibility

def run_avg(signal, alpha=0.2):
    avg_signal = []
    avg = np.mean(signal)
    for i, sample in enumerate(signal):
        if np.isnan(sample) or sample == 0:
            sample = avg
        avg = (1 - alpha) * avg + alpha * sample
        avg_signal.append(avg)
    return np.array(avg_signal)

def train():
    x = np.random.rand(3000)
    y = run_avg(x)
    x = np.reshape(x, (-1, 1, 1))
    y = np.reshape(y, (-1, 1))

    # SimpleRNN model
    model = Sequential()
    model.add(keras.layers.Input(batch_shape=(1, 1, 1), dtype='float32'))
    model.add(keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1'))
    model.compile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse')
    model.summary()

    print(model.get_layer('rnn_layer_1').get_weights())
    model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False)
    print(model.get_layer('rnn_layer_1').get_weights())

train()

但是如果我们使用 Native Keras 运行相同的程序,我们会得到如下所示的错误:

But if we run the same using Native Keras, we get the error shown below:

TypeError: The added layer must be an instance of class Layer. Found: Tensor("input_1_1:0", shape=(1, 1, 1), dtype=float32)

如果我们替换下面的代码行

If we replace the below line of code

model.add(Input(batch_shape=(1, 1, 1), dtype='float32'))

使用下面的代码,

model.add(Dense(32, batch_input_shape=(1,1,1), dtype='float32'))

即使是带有 Keras 实现的 model 收敛性几乎与 TF Keras 实现相似.

even the model with Keras implementation converges almost similar to TF Keras implementation.

如果您想从代码的角度了解两种情况下的实现差异,可以参考以下链接:

You can refer the below links if you want to understand the difference in implementation from code perspective, in both the cases:

https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/keras/layers/recurrent.py#L1364-L1375

https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1082-L1091

这篇关于收敛于 keras-tf 但不收敛于 keras 的精确模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-27 19:29