我正在尝试实现具有3层(1个输入,1个隐藏和1个具有连续结果的输出层)的回归NN.作为基础,我从 coursera.org 类中获得了分类神经网络,但更改了成本函数和梯度计算,因此以适应回归问题(而不是分类问题):
I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one):
My nnCostFunction now is:
function [J grad] = nnCostFunctionLinear(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;
J = 1/(2*m)*sum(sum((a3 - Y).^2))
th1 = Theta1;
th1(:,1) = 0; %set bias = 0 in reg. formula
th2 = Theta2;
th2(:,1) = 0;
t1 = th1.^2;
t2 = th2.^2;
th = sum(sum(t1)) + sum(sum(t2));
th = lambda * th / (2*m);
J = J + th; %regularization
del_3 = a3 - Y;
t1 = del_3'*a2;
Theta2_grad = 2*(t1)/m + lambda*th2/m;
t1 = del_3 * Theta2;
del_2 = t1 .* a2;
del_2 = del_2(:,2:end);
t1 = del_2'*a1;
Theta1_grad = 2*(t1)/m + lambda*th1/m;
grad = [Theta1_grad(:) ; Theta2_grad(:)];
然后,我在 fmincg 算法中使用此功能,但首先迭代fmincg结束了它的工作.我认为我的渐变是错误的,但是我找不到错误.
Then I use this func in fmincg algorithm, but in firsts iterations fmincg end it's work. I think my gradient is wrong, but I can't find the error.
Mikhaill,我也一直在使用NN进行连续回归,并且在某个时候也遇到了类似的问题.最好的做法是在运行模型之前针对数值计算对梯度计算进行测试.如果那不正确,则fmincg将无法训练模型. (顺便说一句,我不鼓励您使用数字梯度,因为所涉及的时间要长得多.)
Mikhaill, I´ve been playing with a NN for continuous regression as well, and had a similar issues at some point. The best thing to do here would be to test gradient computation against a numerical calculation before running the model. If that´s not correct, fmincg won´t be able to train the model. (Btw, I discourage you of using numerical gradient as the time involved is much bigger).
Taking into account that you took this idea from Ng´s Coursera class, I´ll implement a possible solution for you to try using the same notation for Octave.
% Cost function without regularization.
J = 1/2/m^2*sum((a3-Y).^2);
% In case it´s needed, regularization term is added (i.e. for Training).
if (reg==true);
% Derivatives are computed for layer 2 and 3.
% Theta grad is computed without regularization.
% Regularization is added to grad computation.
% Unroll gradients.
grad = [Theta1_grad(:) ; Theta2_grad(:)];
Note that, since you have taken out all the sigmoid activation, the derivative calculation is quite simple and results in a simplification of the original code.
下一步: 1.检查此代码以了解它是否对您的问题有意义. 2.使用梯度检查来测试梯度计算. 3.最后,使用fmincg并检查是否得到不同的结果.
Next steps: 1. Check this code to understand if it makes sense to your problem. 2. Use gradient checking to test gradient calculation. 3. Finally, use fmincg and check you get different results.