本文介绍了Keras模型训练记忆泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Keras,Tensorflow,Python的新手,并且我正在尝试构建用于个人使用/未来学习的模型.我刚开始使用python,并想出了这段代码(在视频和教程的帮助下).我的问题是我的Python内存使用量在每个时期甚至在构建新模型后都在缓慢增加.一旦内存达到100%,训练就会立即停止,而不会出现错误/警告.我不太了解,但是问题应该在循环内的某个地方(如果我没记错的话).我知道

I'm new with Keras, Tensorflow, Python and I'm trying to build a model for personal use/future learning. I've just started with python and I came up with this code (with help of videos and tutorials). My problem is that my memory usage of Python is slowly creeping up with each epoch and even after constructing new model. Once the memory is at 100% the training just stop with no error/warning. I don´t know too much but the issue should be somewhere within the loop (If I´m not mistaken). I know about

,但是这个问题没有被消除,或者我不知道如何将其集成到我的代码中.我有:Python v 3.6.4,Tensorflow 2.0.0rc1(cpu版本),Keras 2.3.0

but either the issue was not removed or I don´t know how to integrate it in my code.I have:Python v 3.6.4,Tensorflow 2.0.0rc1 (cpu version),Keras 2.3.0

这是我的代码:

import pandas as pd
import os
import time
import tensorflow as tf
import numpy as np
import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, BatchNormalization
from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint

EPOCHS = 25
BATCH_SIZE = 32

df = pd.read_csv("EntryData.csv", names=['1SH5', '1SHA', '1SA5', '1SAA', '1WH5', '1WHA',
                                         '2SA5', '2SAA', '2SH5', '2SHA', '2WA5', '2WAA',
                                         '3R1', '3R2', '3R3', '3R4', '3R5', '3R6',
                                         'Target'])

df_val = 14554

validation_df = df[df.index > df_val]
df = df[df.index <= df_val]

train_x = df.drop(columns=['Target'])
train_y = df[['Target']]
validation_x = validation_df.drop(columns=['Target'])
validation_y = validation_df[['Target']]

train_x = np.asarray(train_x)
train_y = np.asarray(train_y)
validation_x = np.asarray(validation_x)
validation_y = np.asarray(validation_y)

train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
validation_x = validation_x.reshape(validation_x.shape[0], 1, validation_x.shape[1])

dense_layers = [0, 1, 2]
layer_sizes = [32, 64, 128]
conv_layers = [1, 2, 3]

for dense_layer in dense_layers:
    for layer_size in layer_sizes:
        for conv_layer in conv_layers:
            NAME = "{}-conv-{}-nodes-{}-dense-{}".format(conv_layer, layer_size,
                    dense_layer, int(time.time()))
            tensorboard = TensorBoard(log_dir="logs\{}".format(NAME))
            print(NAME)

            model = Sequential()
            model.add(LSTM(layer_size, input_shape=(train_x.shape[1:]),
                                       return_sequences=True))
            model.add(Dropout(0.2))
            model.add(BatchNormalization())

            for l in range(conv_layer-1):
                model.add(LSTM(layer_size, return_sequences=True))
                model.add(Dropout(0.1))
                model.add(BatchNormalization())

            for l in range(dense_layer):
                model.add(Dense(layer_size, activation='relu'))
                model.add(Dropout(0.2))

            model.add(Dense(2, activation='softmax'))

            opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)

            # Compile model
            model.compile(loss='sparse_categorical_crossentropy',
                          optimizer=opt,
                          metrics=['accuracy'])

            # unique file name that will include the epoch
            # and the validation acc for that epoch
            filepath = "RNN_Final.{epoch:02d}-{val_accuracy:.3f}"
            checkpoint = ModelCheckpoint("models\{}.model".format(filepath,
                         monitor='val_acc', verbose=0, save_best_only=True,
                         mode='max')) # saves only the best ones

            # Train model
            history = model.fit(
                train_x, train_y,
                batch_size=BATCH_SIZE,
                epochs=EPOCHS,
                validation_data=(validation_x, validation_y),
                callbacks=[tensorboard, checkpoint])

# Score model
score = model.evaluate(validation_x, validation_y, verbose=2)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
# Save model
model.save("models\{}".format(NAME))

我也不知道是否可以在1个问题内提出2个问题(我不想在这里用我的问题向它发送垃圾信息,任何有python经验的人都可以在一分钟内解决)在保存检查点时遇到问题.我只想保存性能最好的模型(每1 NN规范1个模型-节点/层数),但是目前每隔一个时期保存一次.如果这不合适,我可以为此提出另一个问题.

Also I don´t know If it´s possible to ask 2 problems within 1 question (I don´t want to spam it here with my problems which anyone with any python experience can resolve within a minute), but I also have problem with checkpoint saving. I want to save only the best performing model (1 model per 1 NN specification - number of nodes/layers) but currently it is saved after every epoch. If this is inappropriate to ask I can create another question for this.

非常感谢您的帮助.

推荐答案

这是一个已知的错误.更新到Tensorflow 2.1应该可以解决此问题.

This is a known bug. Updating to Tensorflow 2.1 should fix the issue.

这篇关于Keras模型训练记忆泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:16
查看更多