问题描述
我正在使用tensorflow为先前在skflow中编程的MNIST数据集复制神经网络.这是skflow中的模型:
I'm using tensorflow to replicate a neural network for the MNIST dataset, previously programmed in skflow. Here is the model in skflow:
import tensorflow.contrib.learn as skflow
from sklearn import metrics
from sklearn.datasets import fetch_mldata
from sklearn.cross_validation import train_test_split
mnist = fetch_mldata('MNIST original')
train_dataset, test_dataset, train_labels, test_labels = train_test_split( mnist.data, mnist.target, test_size=10000, random_state=42)
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[1200, 1200], n_classes=10, optimizer="SGD", learning_rate=0.01, batch_size=128, steps=1000)
classifier.fit(train_dataset, train_labels)
score = metrics.accuracy_score(test_labels, classifier.predict(test_dataset))
print("Accuracy: %f" % score)
该模型的精度为0.950600.
This model get 0.950600 of accuracy.
但是在tensorflow中复制的模型在损失函数中变得很困难并且无法改进(我认为它与不相关Tensorflow NaN错误?,因为我正在使用tf.nn.softmax_cross_entropy_with_logits).
But the model replicated in tensorflow gets nan in the loss fuction and fails to improve (I think it's not related with Tensorflow NaN bug? since I'm using tf.nn.softmax_cross_entropy_with_logits).
我不知道为什么,因为tensorflow中的模型设置与skflow中的模型相同.我唯一不确定的是是否相同,是关于skflow如何初始化网络的权重,我在skflow的代码中搜索了该部分,但没有找到.
I can't figure out why, since the setup of the model in tensorflow is the same than in the model in skflow. The only thing I'm unsure if it's the same, is on how skflow initializes the weights of the network, I searched that part in the code of skflow but I have not found it.
这是张量流中的代码:
import numpy as np
import tensorflow as tf
from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
num_labels = len(np.unique(mnist.target))
num_pixels = mnist.data.shape[1]
#reshape labels to one hot encoding
labels = (np.arange(num_labels) == mnist.target[:, None]).astype(np.float32)
#create train_dataset of 60000 and test_dataset of 10000 elem
train_dataset, test_dataset, train_labels, test_labels = train_test_split(mnist.data, labels, test_size=10000, random_state=42)
def accuracy(predictions, labels):
return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])
batch_size = 128
graph = tf.Graph()
with graph.as_default():
# Input data.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, num_pixels))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_test_dataset = tf.cast(tf.constant(test_dataset), tf.float32)
w_hidden = tf.Variable(tf.truncated_normal([num_pixels, 1200]))
b_hidden = tf.Variable(tf.zeros([1200]))
hidden = tf.nn.relu(tf.matmul(tf_train_dataset, w_hidden) + b_hidden)
w_hidden_2 = tf.Variable(tf.truncated_normal([1200, 1200]))
b_hidden_2 = tf.Variable(tf.zeros([1200]))
hidden2 = tf.nn.relu(tf.matmul(hidden, w_hidden_2) + b_hidden_2)
w = tf.Variable(tf.truncated_normal([1200, num_labels]))
b = tf.Variable(tf.zeros([num_labels]))
logits = tf.matmul(hidden2, w) + b
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits, tf_train_labels))
# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Predictions for the training, and test data.
train_prediction = tf.nn.softmax(logits)
test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, w_hidden) + b_hidden), w_hidden_2) + b_hidden_2), w) + b)
num_steps = 1001
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
_, l, predictions = session.run( [optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 100 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
我对可能的问题一无所知.有什么建议吗?
I'm clueless on what might be the issue. Any suggestions?
根据我的建议,我尝试将tf.Variable调用替换为tf.get_variable("w_hidden",[num_pixels,1200]),但是我得到了Nans.
Edited 1: As I was suggested, I tried replacing tf.Variable calls with tf.get_variable("w_hidden", [num_pixels, 1200]), but I got Nans.
此外,我使用skflow.ops.dnn op进行图层处理,并使用自己的损失等,仍然得到Nans.
Also, I used skflow.ops.dnn op for doing the layers and used my own loss and etc, and still got Nans.
原来,这不是重量初始化的问题.似乎梯度太不稳定了(在张量流模型中),导致损失变为NaN.如向TensorFlow添加多个层会导致损失函数成为Nan 后,我的学习速度降低了一个数量级,并且效果很好.
Edited 2: Turns out it is not a problem of weight initialization. It seems that the gradients are too unstable (in the tensorflow model) and that lead the loss to become NaN. As in Adding multiple layers to TensorFlow causes loss function to become Nan, I slowed the learning rate by an order of magnitude, and it worked out.
现在我不了解的是skflow的SGD优化器和上面的优化器之间的区别.还是他们似乎"相等但他们需要不同的学习率的解释是什么?
Now what I don't understand is what differs between the SGD optimizer of skflow and the one above. Or what is the explanation that they "seem" equal, but they need different learning rates?
推荐答案
skflow中的初始化依赖于tf.get_variable
默认初始化-uniform_unit_scaling_initializer
(请参见此以进行详细说明).
Initialization in skflow relies on tf.get_variable
default initialization - uniform_unit_scaling_initializer
(see this for detailed description).
您可以尝试将tf.Variable
呼叫替换为tf.get_variable("w_hidden", [num_pixels, 1200])
之类的内容.
You can try replacing your tf.Variable
calls with something like tf.get_variable("w_hidden", [num_pixels, 1200])
.
替代方法是从使用skflow.ops.dnn
op开始,该操作将为您完成图层操作,但您仍然会自己承担损失等.
Alternative, is to start with using skflow.ops.dnn
op that will do the layers for you but you still do your own loss and etc.
也请告诉我,如果您有一个明确的用例,迫使您用纯TensorFlow而不是使用skflow重写内容-我很想解决这个问题.您始终可以通过将model_fn
传递到TensorFlowEstimator
来编写自定义模型,并且仍然使用训练/批处理/保存等功能.
Also please let me know if you there a clear usecase that forced you to rewrite things in pure TensorFlow instead of using skflow - I would love to address it. You can always write custom model via passing model_fn
into TensorFlowEstimator
and still use training / batching / saving and etc functionality.
这篇关于tensorflow模型的结果与skflow(优化器)中的相同模型不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!