问题描述
我已经使用Google的 TensorFlow 库构建了一个MLP.网络正在运行,但不知何故拒绝正确学习.无论输入实际上是什么,它始终会收敛到接近1.0的输出.
I've built an MLP with Google's TensorFlow library. The network is working but somehow it refuses to learn properly. It always converges to an output of nearly 1.0 no matter what the input actually is.
完整代码可以在此处看到.
有什么想法吗?
输入和输出(批次大小4)如下:
The input and output (batch size 4) is as follows:
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [[0.], [1.], [1.], [0.]] # XOR output
n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")
隐藏层配置:
# hidden layer's bias neuron
b_hidden = tf.Variable(0.1, name="hidden_bias")
# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0), name="hidden_weights")
# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
输出层配置:
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0), name="output_weights") # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer's activation
我的学习方法如下:
My learning methods look like this:
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.01) # take a gradient descent for optimizing
train = optimizer.minimize(loss) # let the optimizer train
我尝试了交叉熵的两种设置:
cross_entropy = -tf.reduce_sum(n_output * tf.log(output))
和
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(n_output, output)
其中,n_output
是原始输出,如output_data
中所述,output
是我的网络的预测/计算值.
where n_output
is the original output as described in output_data
and output
the predicted/calculated value by my network.
for循环内的训练(用于n个纪元)是这样的:
The training inside the for-loop (for n epochs) goes like this:
cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
feed_dict={n_input: input_data, n_output: output_data})
我将结果保存到用于loss
,W_hidden
,...
I am saving the outcome to cvalues for debug printig of loss
, W_hidden
, ...
无论我尝试了什么,当我测试网络并尝试验证输出时,它总是会产生类似以下内容:
No matter what I've tried, when I test my network, trying to validate the output, it always produces something like this:
(...)
step: 2000
loss: 0.0137040186673
b_hidden: 1.3272010088
W_hidden: [[ 0.23195425 0.53248233 -0.21644847 -0.54775208 0.52298909]
[ 0.73933059 0.51440752 -0.08397482 -0.62724304 -0.53347367]]
W_output: [[ 1.65939867]
[ 0.78912479]
[ 1.4831928 ]
[ 1.28612828]
[ 1.12486529]]
(--- finished with 2000 epochs ---)
(Test input for validation:)
input: [0.0, 0.0] | output: [[ 0.99339396]]
input: [0.0, 1.0] | output: [[ 0.99289012]]
input: [1.0, 0.0] | output: [[ 0.99346077]]
input: [1.0, 1.0] | output: [[ 0.99261558]]
因此,学习不是很正常,但是无论输入哪个输入,总会收敛到接近1.0.
So it is not learning properly but always converging to nearly 1.0 no matter which input is fed.
推荐答案
同时,在同事的帮助下,我能够修复我的解决方案,并希望将其发布以保持完整性.我的解决方案具有交叉熵,并且无需更改训练数据即可.此外,它具有所需的输入形状((1,2))和输出为标量.
In the meanwhile with the help of a colleague I were able to fix my solution and wanted to post it for completeness. My solution works with cross entropy and without altering the training data. Additionally it has the desired input shape of (1, 2) and ouput is scalar.
它使用AdamOptimizer
来比GradientDescentOptimizer
更快地减少错误很多.有关更多信息,请参见此帖子(& ;有关优化程序的问题^^).
It makes use of an AdamOptimizer
which decreases the error much faster than a GradientDescentOptimizer
. See this post for more information (& questions^^) about the optimizer.
实际上,我的网络仅在400-800个学习步骤中就产生了相当不错的结果.
In fact, my network produces reasonably good results in only 400-800 learning steps.
经过2000个学习步骤,输出几乎完美":
After 2000 learning steps the output is nearly "perfect":
step: 2000
loss: 0.00103311243281
input: [0.0, 0.0] | output: [[ 0.00019799]]
input: [0.0, 1.0] | output: [[ 0.99979786]]
input: [1.0, 0.0] | output: [[ 0.99996307]]
input: [1.0, 1.0] | output: [[ 0.00033751]]
import tensorflow as tf
#####################
# preparation stuff #
#####################
# define input and output data
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [[0.], [1.], [1.], [0.]] # XOR output
# create a placeholder for the input
# None indicates a variable batch size for the input
# one input's dimension is [1, 2] and output's [1, 1]
n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")
# number of neurons in the hidden layer
hidden_nodes = 5
################
# hidden layer #
################
# hidden layer's bias neuron
b_hidden = tf.Variable(tf.random_normal([hidden_nodes]), name="hidden_bias")
# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_normal([2, hidden_nodes]), name="hidden_weights")
# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
################
# output layer #
################
W_output = tf.Variable(tf.random_normal([hidden_nodes, 1]), name="output_weights") # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer's activation
############
# learning #
############
cross_entropy = -(n_output * tf.log(output) + (1 - n_output) * tf.log(1 - output))
# cross_entropy = tf.square(n_output - output) # simpler, but also works
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.AdamOptimizer(0.01) # take a gradient descent for optimizing with a "stepsize" of 0.1
train = optimizer.minimize(loss) # let the optimizer train
####################
# initialize graph #
####################
init = tf.initialize_all_variables()
sess = tf.Session() # create the session and therefore the graph
sess.run(init) # initialize all variables
#####################
# train the network #
#####################
for epoch in xrange(0, 2001):
# run the training operation
cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
feed_dict={n_input: input_data, n_output: output_data})
# print some debug stuff
if epoch % 200 == 0:
print("")
print("step: {:>3}".format(epoch))
print("loss: {}".format(cvalues[1]))
# print("b_hidden: {}".format(cvalues[3]))
# print("W_hidden: {}".format(cvalues[2]))
# print("W_output: {}".format(cvalues[4]))
print("")
print("input: {} | output: {}".format(input_data[0], sess.run(output, feed_dict={n_input: [input_data[0]]})))
print("input: {} | output: {}".format(input_data[1], sess.run(output, feed_dict={n_input: [input_data[1]]})))
print("input: {} | output: {}".format(input_data[2], sess.run(output, feed_dict={n_input: [input_data[2]]})))
print("input: {} | output: {}".format(input_data[3], sess.run(output, feed_dict={n_input: [input_data[3]]})))
这篇关于TensorFlow MLP不训练XOR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!