神经网络:Sigmoid激活函数，用于连续输出变量

本文介绍了神经网络:Sigmoid激活函数，用于连续输出变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好吧，所以我在安德鲁(Andrew Ng)在Coursera上的机器学习课程中，并且想要适应作为作业4的一部分完成的神经网络.

Okay, so I am in the middle of Andrew Ng's machine learning course on coursera and would like to adapt the neural network which was completed as part of assignment 4.

特别是，作为作业的一部分，我正确完成的神经网络如下:

In particular, the neural network which I had completed correctly as part of the assignment was as follows:

Sigmoid激活功能:g(z) = 1/(1+e^(-z))
10个输出单位，每个单位可能占用0或1
1个隐藏层
用于最小化成本函数的反向传播方法
成本函数:

Sigmoid activation function: g(z) = 1/(1+e^(-z))
10 output units, each which could take 0 or 1
1 hidden layer
Back-propagation method used to minimize cost function
Cost function:

其中L=number of layers，s_l = number of units in layer l，m = number of training examples，K = number of output units

现在，我想调整练习，以便有一个连续的输出单位，其取值介于[0,1]之间，并且我正在尝试找出需要更改的内容，到目前为止，我已经有了

Now I want to adjust the exercise so that there is one continuous output unit that takes any value between [0,1] and I am trying to work out what needs to change, so far I have

用我自己的数据替换数据，即输出是介于0和1之间的连续变量
更新了对输出单元数量的引用
将反向传播算法中的成本函数更新为:其中a_3是根据正向传播确定的输出单位的值.

Replaced the data with my own, i.e.,such that the output is continuous variable between 0 and 1
Updated references to the number of output units
Updated the cost function in the back-propagation algorithm to:where a_3 is the value of the output unit determined from forward propagation.

我敢肯定，梯度检查方法会显示由反向传播确定的梯度，并且数值近似不再匹配，因此必须进行其他更改.我没有改变S形梯度.它留在f(z)*(1-f(z))处，其中f(z)是S型函数1/(1+e^(-z)))，我也没有更新导数公式的数值近似；只需(J(theta+e) - J(theta-e))/(2e).

I am certain that something else must change as the gradient checking method shows the gradient determined by back-propagation and that by the numerical approximation no longer match up. I did not change the sigmoid gradient; it is left at f(z)*(1-f(z)) where f(z) is the sigmoid function 1/(1+e^(-z))) nor did I update the numerical approximation of the derivative formula; simply (J(theta+e) - J(theta-e))/(2e).

任何人都可以告知需要采取哪些其他步骤吗?

Can anyone advise of what other steps would be required?

在Matlab中编码如下:

Coded in Matlab as follows:

% FORWARD PROPAGATION
% input layer
a1 = [ones(m,1),X];
% hidden layer
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1),a2];
% output layer
z3 = a2*Theta2';
a3 = sigmoid(z3);

% BACKWARD PROPAGATION
delta3 = a3 - y;
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);
Theta1_grad = (delta2'*a1)/m;
Theta2_grad = (delta3'*a2)/m;

% COST FUNCTION
J = 1/(2 * m) * sum( (a3-y).^2 );

% Implement regularization with the cost function and gradients.
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m;
J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));

此后，我意识到这个问题与提出的问题类似. @Mikhail Erofeev在StackOverflow上，但是在这种情况下，我希望连续变量在0到1之间，因此使用S形函数.

I have since realised that this question is similar to that asked by @Mikhail Erofeev on StackOverflow, however in this case I wish the continuous variable to be between 0 and 1 and therefore use a sigmoid function.

推荐答案

首先，您的费用函数应为:

First, your cost function should be:

J = 1/m * sum( (a3-y).^2 );

我认为您的Theta2_grad = (delta3'*a2)/m;在更改为delta3 = 1/2 * (a3 - y);后应该与数值近似值匹配).

I think your Theta2_grad = (delta3'*a2)/m;is expected to match the numerical approximation after changed to delta3 = 1/2 * (a3 - y);).

检查此幻灯片了解更多详细信息.

Check this slide for more details.

如果我们的代码之间存在一些细微的差异，我在下面粘贴了我的代码以供您参考.该代码已经与数值逼近函数checkNNGradients(lambda);进行了比较，相对差小于1e-4(尽管不满足吴德鲁博士的1e-11要求)

In case there is some minor discrepancy between our codes, I pasted my code below for your reference. The code has already been compared with numerical approximation function checkNNGradients(lambda);, the Relative Difference is less than 1e-4 (not meets the 1e-11 requirement by Dr.Andrew Ng though)

function [J grad] = nnCostFunctionRegression(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

m = size(X, 1);
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));


X = [ones(m, 1) X];
z1 = sigmoid(X * Theta1');
zs = z1;
z1 = [ones(m, 1) z1];
z2 = z1 * Theta2';
ht = sigmoid(z2);


y_recode = zeros(length(y),num_labels);
for i=1:length(y)
    y_recode(i,y(i))=1;
end
y = y_recode;


regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
J=1/(m)*sum(sum((ht - y).^2))+regularization;
delta_3 = 1/2*(ht - y);
delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');

delta_cap2 = delta_3' * z1;
delta_cap1 = delta_2' * X;

Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1));
Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2));

Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));
Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1)));


grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

这篇关于神经网络:Sigmoid激活函数，用于连续输出变量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..