问题描述
我知道在小型网络中需要偏置才能转移激活功能.但是,对于具有多个CNN层,合并,丢失和其他非线性激活的Deep网络而言,偏差确实会有所作为吗?卷积滤波器正在学习局部特征并针对给定的conv输出通道使用相同的偏置.
I understand that bias are required in small networks, to shift the activation function. But in the case of Deep network that has multiple layers of CNN, pooling, dropout and other non -linear activations, is Bias really making a difference? The convolutional filter is learning local features and for a given conv output channel same bias is used.
这不是此链接.上面的链接仅说明了偏差在小型神经网络中的作用,而没有尝试说明偏差在包含多个CNN层,辍学,合并和非线性激活函数的深层网络中的作用.
This is not a dupe of this link. The above link only explains role of bias in small neural network and does not attempt to explain role of bias in deep-networks containing multiple CNN layers, drop-outs, pooling and non-linear activation functions.
我进行了一个简单的实验,结果表明,去除conv层的偏差对最终测试的准确性没有影响.有两种训练过的模型,其测试精度几乎相同(在没有偏差的情况下,其准确性要稍好一点).
I ran a simple experiment and the results indicated that removing bias from conv layer made no difference in final test accuracy.There are two models trained and the test-accuracy is almost same (slightly better in one without bias.)
- model_with_bias,
- model_without_bias(在转换层中未添加偏差)
仅出于历史原因使用它们吗?
Are they being used only for historical reasons?
如果使用偏置不能提高准确性,那么我们不应该忽略它们吗?需要学习的参数更少.
如果有人比我有更深的知识,可以解释这些偏见在深层网络中的意义(如果有的话),我将不胜感激.
I would be thankful if someone who have deeper knowledge than me, could explain the significance(if- any) of these bias in deep networks.
这是完整的代码和实验结果 bias-VS -no_bias实验
Here is the complete code and the experiment result bias-VS-no_bias experiment
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64
graph = tf.Graph()
with graph.as_default():
# Input data.
tf_train_dataset = tf.placeholder(
tf.float32, shape=(batch_size, image_size, image_size, num_channels))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Variables.
layer1_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, num_channels, depth], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([depth]))
layer2_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, depth, depth], stddev=0.1))
layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
layer3_weights = tf.Variable(tf.truncated_normal(
[image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.truncated_normal(
[num_hidden, num_labels], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
# define a Model with bias .
def model_with_bias(data):
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer1_biases)
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
# define a Model without bias added in the convolutional layer.
def model_without_bias(data):
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv ) # layer1_ bias is not added
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv) # + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
# bias are added only in Fully connected layer(layer 3 and layer 4)
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
# Training computation.
logits_with_bias = model_with_bias(tf_train_dataset)
loss_with_bias = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_with_bias))
logits_without_bias = model_without_bias(tf_train_dataset)
loss_without_bias = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_without_bias))
# Optimizer.
optimizer_with_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_with_bias)
optimizer_without_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_without_bias)
# Predictions for the training, validation, and test data.
train_prediction_with_bias = tf.nn.softmax(logits_with_bias)
valid_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_valid_dataset))
test_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_test_dataset))
# Predictions for without
train_prediction_without_bias = tf.nn.softmax(logits_without_bias)
valid_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_valid_dataset))
test_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_test_dataset))
num_steps = 1001
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized')
for step in range(num_steps):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
batch_labels = train_labels[offset:(offset + batch_size), :]
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
session.run(optimizer_with_bias, feed_dict=feed_dict)
session.run(optimizer_without_bias, feed_dict = feed_dict)
print('Test accuracy(with bias): %.1f%%' % accuracy(test_prediction_with_bias.eval(), test_labels))
print('Test accuracy(without bias): %.1f%%' % accuracy(test_prediction_without_bias.eval(), test_labels))
输出:
已初始化
测试准确度(有偏差):90.5%
Test accuracy(with bias): 90.5%
测试准确度(无偏差):90.6%
Test accuracy(without bias): 90.6%
推荐答案
在大型模型中,删除偏置输入几乎没有什么区别,因为每个节点都可以使偏置节点脱离其所有输入的平均激活,这根据大数定律是大致正常的.在第一层,发生这种情况的能力取决于您的输入分布.例如,对于MNIST,输入的平均激活大致恒定. 在小型网络上,您当然需要一个偏差输入,但是在大型网络上,将其删除几乎没有任何作用.
In a large model, removing the bias inputs makes very little difference because each node can make a bias node out of the average activation of all of its inputs, which by the law of large numbers will be roughly normal. At the first layer, the ability for this to happens depends on your input distribution. For MNIST for example, the input's average activation is roughly constant. On a small network, of course you need a bias input, but on a large network, removing it makes almost no difference.
尽管在大型网络中没有什么区别,但仍取决于网络体系结构.例如在LSTM中:
Although in a large network it has no difference, it still depends on network architecture. For instance in LSTM:
另请参阅:
- The rule of bias in Neural network
- What is bias in Neural network
- An Empirical Exploration of Recurrent Network Architectures
这篇关于卷积层中的偏差真的会影响测试精度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!