tensorflow模型的结果与skflow(优化器)中的相同模型不同

本文介绍了tensorflow模型的结果与skflow(优化器)中的相同模型不同的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用tensorflow为先前在skflow中编程的MNIST数据集复制神经网络.这是skflow中的模型:

I'm using tensorflow to replicate a neural network for the MNIST dataset, previously programmed in skflow. Here is the model in skflow:

import tensorflow.contrib.learn as skflow
from sklearn import metrics
from sklearn.datasets import fetch_mldata
from sklearn.cross_validation import train_test_split

mnist = fetch_mldata('MNIST original')

train_dataset, test_dataset, train_labels, test_labels = train_test_split( mnist.data, mnist.target, test_size=10000, random_state=42)

classifier = skflow.TensorFlowDNNClassifier(hidden_units=[1200, 1200], n_classes=10, optimizer="SGD", learning_rate=0.01, batch_size=128, steps=1000)
classifier.fit(train_dataset, train_labels)
score = metrics.accuracy_score(test_labels, classifier.predict(test_dataset))
print("Accuracy: %f" % score)

该模型的精度为0.950600.

This model get 0.950600 of accuracy.

但是在tensorflow中复制的模型在损失函数中变得很困难并且无法改进(我认为它与不相关Tensorflow NaN错误?，因为我正在使用tf.nn.softmax_cross_entropy_with_logits).

But the model replicated in tensorflow gets nan in the loss fuction and fails to improve (I think it's not related with Tensorflow NaN bug? since I'm using tf.nn.softmax_cross_entropy_with_logits).

我不知道为什么，因为tensorflow中的模型设置与skflow中的模型相同.我唯一不确定的是是否相同，是关于skflow如何初始化网络的权重，我在skflow的代码中搜索了该部分，但没有找到.

I can't figure out why, since the setup of the model in tensorflow is the same than in the model in skflow. The only thing I'm unsure if it's the same, is on how skflow initializes the weights of the network, I searched that part in the code of skflow but I have not found it.

这是张量流中的代码:

import numpy as np
import tensorflow as tf
from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_mldata

mnist = fetch_mldata('MNIST original')

num_labels = len(np.unique(mnist.target))
num_pixels = mnist.data.shape[1]

#reshape labels to one hot encoding
labels = (np.arange(num_labels) == mnist.target[:, None]).astype(np.float32)

#create train_dataset of 60000 and test_dataset of 10000 elem
train_dataset, test_dataset, train_labels, test_labels = train_test_split(mnist.data, labels, test_size=10000, random_state=42)


def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])


batch_size = 128
graph = tf.Graph()
with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(tf.float32,
                                  shape=(batch_size, num_pixels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_test_dataset = tf.cast(tf.constant(test_dataset), tf.float32)

    w_hidden = tf.Variable(tf.truncated_normal([num_pixels, 1200]))
    b_hidden = tf.Variable(tf.zeros([1200]))
    hidden = tf.nn.relu(tf.matmul(tf_train_dataset, w_hidden) + b_hidden)

    w_hidden_2 = tf.Variable(tf.truncated_normal([1200, 1200]))
    b_hidden_2 = tf.Variable(tf.zeros([1200]))
    hidden2 = tf.nn.relu(tf.matmul(hidden, w_hidden_2) + b_hidden_2)

    w = tf.Variable(tf.truncated_normal([1200, num_labels]))
    b = tf.Variable(tf.zeros([num_labels]))
    logits = tf.matmul(hidden2, w) + b

    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits, tf_train_labels))

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

    # Predictions for the training, and test data.
    train_prediction = tf.nn.softmax(logits)
    test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, w_hidden) + b_hidden), w_hidden_2) + b_hidden_2), w) + b)

num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print("Initialized")
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)

        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]

        # Prepare a dictionary telling the session where to feed the minibatch.
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run( [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 100 == 0):
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
    print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

我对可能的问题一无所知.有什么建议吗?

I'm clueless on what might be the issue. Any suggestions?

根据我的建议，我尝试将tf.Variable调用替换为tf.get_variable("w_hidden"，[num_pixels，1200])，但是我得到了Nans.

Edited 1: As I was suggested, I tried replacing tf.Variable calls with tf.get_variable("w_hidden", [num_pixels, 1200]), but I got Nans.

此外，我使用skflow.ops.dnn op进行图层处理，并使用自己的损失等，仍然得到Nans.

Also, I used skflow.ops.dnn op for doing the layers and used my own loss and etc, and still got Nans.

原来，这不是重量初始化的问题.似乎梯度太不稳定了(在张量流模型中)，导致损失变为NaN.如向TensorFlow添加多个层会导致损失函数成为Nan 后，我的学习速度降低了一个数量级，并且效果很好.

Edited 2: Turns out it is not a problem of weight initialization. It seems that the gradients are too unstable (in the tensorflow model) and that lead the loss to become NaN. As in Adding multiple layers to TensorFlow causes loss function to become Nan, I slowed the learning rate by an order of magnitude, and it worked out.

现在我不了解的是skflow的SGD优化器和上面的优化器之间的区别.还是他们似乎"相等但他们需要不同的学习率的解释是什么?

Now what I don't understand is what differs between the SGD optimizer of skflow and the one above. Or what is the explanation that they "seem" equal, but they need different learning rates?

中的相同模型不同

tensorflow模型的结果与skflow(优化器)中的相同模型不同

问题描述

推荐答案