卷积层的偏差真的对测试精度有影响吗?

本文介绍了卷积层的偏差真的对测试精度有影响吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道在小型网络中需要偏置来改变激活函数.但是在具有多层 CNN、池化、dropout 和其他非线性激活的深度网络的情况下，Bias 真的有所作为吗? 卷积滤波器正在学习局部特征并针对给定的卷积输出通道使用相同的偏置.

I understand that bias are required in small networks, to shift the activation function. But in the case of Deep network that has multiple layers of CNN, pooling, dropout and other non -linear activations, is Bias really making a difference? The convolutional filter is learning local features and for a given conv output channel same bias is used.

这不是这个链接.上述链接仅解释了偏差在小型神经网络中的作用，并没有试图解释偏差在包含多个 CNN 层、drop-outs、池化和非线性激活函数的深层网络中的作用.

This is not a dupe of this link. The above link only explains role of bias in small neural network and does not attempt to explain role of bias in deep-networks containing multiple CNN layers, drop-outs, pooling and non-linear activation functions.

我进行了一个简单的实验，结果表明从 conv 层去除偏差对最终测试的准确性没有影响.训练了两个模型，测试准确率几乎相同(没有偏差的一个稍微好一点.)

I ran a simple experiment and the results indicated that removing bias from conv layer made no difference in final test accuracy.There are two models trained and the test-accuracy is almost same (slightly better in one without bias.)

model_with_bias，
model_without_bias(转换层中未添加偏差)

是否仅出于历史原因使用它们?

Are they being used only for historical reasons?

如果使用偏差不能提高准确性，我们不应该忽略它们吗?要学习的参数更少.

如果有人比我有更深入的知识，我会很感激，可以解释这些偏见在深层网络中的重要性(如果有的话).

I would be thankful if someone who have deeper knowledge than me, could explain the significance(if- any) of these bias in deep networks.

这里是完整代码和实验结果bias-VS-no_bias 实验

Here is the complete code and the experiment result bias-VS-no_bias experiment

batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))
  layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
  layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))

  # define a Model with bias .
  def model_with_bias(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases

  # define a Model without bias added in the convolutional layer.
  def model_without_bias(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv ) # layer1_ bias is not added
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv) # + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    # bias are added only in Fully connected layer(layer 3 and layer 4)
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases

  # Training computation.
  logits_with_bias = model_with_bias(tf_train_dataset)
  loss_with_bias = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_with_bias))

  logits_without_bias = model_without_bias(tf_train_dataset)
  loss_without_bias = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits_without_bias))

  # Optimizer.
  optimizer_with_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_with_bias)
  optimizer_without_bias = tf.train.GradientDescentOptimizer(0.05).minimize(loss_without_bias)

  # Predictions for the training, validation, and test data.
  train_prediction_with_bias = tf.nn.softmax(logits_with_bias)
  valid_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_valid_dataset))
  test_prediction_with_bias = tf.nn.softmax(model_with_bias(tf_test_dataset))

  # Predictions for without
  train_prediction_without_bias = tf.nn.softmax(logits_without_bias)
  valid_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_valid_dataset))
  test_prediction_without_bias = tf.nn.softmax(model_without_bias(tf_test_dataset))

num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    session.run(optimizer_with_bias, feed_dict=feed_dict)
    session.run(optimizer_without_bias, feed_dict = feed_dict)
  print('Test accuracy(with bias): %.1f%%' % accuracy(test_prediction_with_bias.eval(), test_labels))
  print('Test accuracy(without bias): %.1f%%' % accuracy(test_prediction_without_bias.eval(), test_labels))

输出:

已初始化

测试准确率(有偏差):90.5%

Test accuracy(with bias): 90.5%

测试准确率(无偏差):90.6%

Test accuracy(without bias): 90.6%

Bias

卷积层的偏差真的对测试精度有影响吗?

问题描述

推荐答案