问题描述
我正在尝试实现具有3层(1个输入,1个隐藏和1个具有连续结果的输出层)的回归NN.作为基础,我从 coursera.org 类中获得了分类神经网络,但更改了成本函数和梯度计算,因此以适应回归问题(而不是分类问题):
I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one):
我的nnCostFunction现在是:
My nnCostFunction now is:
function [J grad] = nnCostFunctionLinear(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
a1 = X;
a1 = [ones(m, 1) a1];
a2 = a1 * Theta1';
a2 = [ones(m, 1) a2];
a3 = a2 * Theta2';
Y = y;
J = 1/(2*m)*sum(sum((a3 - Y).^2))
th1 = Theta1;
th1(:,1) = 0; %set bias = 0 in reg. formula
th2 = Theta2;
th2(:,1) = 0;
t1 = th1.^2;
t2 = th2.^2;
th = sum(sum(t1)) + sum(sum(t2));
th = lambda * th / (2*m);
J = J + th; %regularization
del_3 = a3 - Y;
t1 = del_3'*a2;
Theta2_grad = 2*(t1)/m + lambda*th2/m;
t1 = del_3 * Theta2;
del_2 = t1 .* a2;
del_2 = del_2(:,2:end);
t1 = del_2'*a1;
Theta1_grad = 2*(t1)/m + lambda*th1/m;
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end
然后,我在 fmincg 算法中使用此功能,但首先迭代fmincg结束了它的工作.我认为我的渐变是错误的,但是我找不到错误.
Then I use this func in fmincg algorithm, but in firsts iterations fmincg end it's work. I think my gradient is wrong, but I can't find the error.
有人可以帮忙吗?
推荐答案
Mikhaill,我也一直在使用NN进行连续回归,并且在某个时候也遇到了类似的问题.最好的做法是在运行模型之前针对数值计算对梯度计算进行测试.如果那不正确,则fmincg将无法训练模型. (顺便说一句,我不鼓励您使用数字梯度,因为所涉及的时间要长得多.)
Mikhaill, I´ve been playing with a NN for continuous regression as well, and had a similar issues at some point. The best thing to do here would be to test gradient computation against a numerical calculation before running the model. If that´s not correct, fmincg won´t be able to train the model. (Btw, I discourage you of using numerical gradient as the time involved is much bigger).
考虑到您是从Ng的Coursera班级学到的这个想法,我将为您实现一个可行的解决方案,以尝试对Octave使用相同的表示法.
Taking into account that you took this idea from Ng´s Coursera class, I´ll implement a possible solution for you to try using the same notation for Octave.
% Cost function without regularization.
J = 1/2/m^2*sum((a3-Y).^2);
% In case it´s needed, regularization term is added (i.e. for Training).
if (reg==true);
J=J+lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
endif;
% Derivatives are computed for layer 2 and 3.
d3=(a3.-Y);
d2=d3*Theta2(:,2:end);
% Theta grad is computed without regularization.
Theta1_grad=(d2'*a1)./m;
Theta2_grad=(d3'*a2)./m;
% Regularization is added to grad computation.
Theta1_grad(:,2:end)=Theta1_grad(:,2:end)+(lambda/m).*Theta1(:,2:end);
Theta2_grad(:,2:end)=Theta2_grad(:,2:end)+(lambda/m).*Theta2(:,2:end);
% Unroll gradients.
grad = [Theta1_grad(:) ; Theta2_grad(:)];
请注意,由于您已经完成了所有S型激活,因此导数计算非常简单,并且简化了原始代码.
Note that, since you have taken out all the sigmoid activation, the derivative calculation is quite simple and results in a simplification of the original code.
下一步: 1.检查此代码以了解它是否对您的问题有意义. 2.使用梯度检查来测试梯度计算. 3.最后,使用fmincg并检查是否得到不同的结果.
Next steps: 1. Check this code to understand if it makes sense to your problem. 2. Use gradient checking to test gradient calculation. 3. Finally, use fmincg and check you get different results.
这篇关于使用神经网络进行连续回归的梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!