进行梯度下降时检查梯度

如果任何人都可以提供任何建议，我将非常感谢您的帮助(无论是在计算梯度还是在执行检查方面).因为我已经极大地简化了代码以使其更具可读性，所以我没有添加任何偏见，并且不再使用权重矩阵.首先，我初始化变量: numHidden = 200;numVisible = 784;低= -4 * sqrt(6 ./(numHidden + numVisible));高= 4 * sqrt(6 ./((numHidden + numVisible)));编码器=低+(高-低)* rand(numVisible，numHidden);解码器=低+(高-低)* rand(numHidden，numVisible); 接下来，给定一些输入图像 x ，进行前馈传播: a = Sigmoid(x * encoder);z = sigmoid(a *解码器);％(x的重建) 我正在使用的损失函数是标准Σ(0.5 *(z-x)^ 2)): ％首先通过找到sum(0.5 *(z-x).^ 2)的导数来计算误差.％，它是(f(h)-x)* f'(h)，其中z = f(h)，h = a *解码器，并且％f = S形(x).但是，由于乙状结肠的导数是％sigmoid *(1-Sigmoid)，我们得到:error_0 =(z-x).* z.*(1-z);％梯度\ Delta w_ {ji} = error_j * a_igDecoder = error_0'* a;％不重要，但为完整起见包含％向下传播一层错误_1 =(错误_0 *编码器).* a.*(1-a);gEncoder = error_1'* x; 最后，检查梯度是否正确(在这种情况下，只需对解码器执行此操作即可): epsilon = 10e-5;检查= gDecoder(:);我们上面获得的值的百分比对于i = 1:size(decoder(:)，1)％计算J +theta =解码器(:);展开百分比theta(i)= theta(i)+ epsilon;解码器p = reshape(theta，size(decoder));重新投放百分比a = sigmoid(x *编码器);z = sigmoid(a * decoderp);Jp = sum(0.5 *(z-x).^ 2);％计算J-theta =解码器(:);theta(i)= theta(i)-epsilon;解码器p = reshape(theta，size(decoder));a = sigmoid(x *编码器);z = sigmoid(a * decoderp);Jm = sum(0.5 *(z-x).^ 2);grad_i =(Jp-Jm)/(2 * epsilon);diff = abs(grad_i-check(i));fprintf('％d:％f< =>％f:％f \ n'，i，grad_i，check(i)，diff);结尾在MNIST数据集上(对于第一个条目)运行此操作，得出的结果如下: 2:0.093885< =>0.028398:0.0654873:0.066285< =>0.031096:0.0351895:0.053074< =>0.019839:0.0332356:0.108249< =>0.042407:0.0658437:0.091576< =>0.009014:0.082562 解决方案在a和z上均不采用S型.只需在z上使用它即可. a = x * encoder;z = sigmoid(a * decoderp); I'm trying to implement a feed-forward backpropagating autoencoder (training with gradient descent) and wanted to verify that I'm calculating the gradient correctly. This tutorial suggests calculating the derivative of each parameter one at a time: grad_i(theta) = (J(theta_i+epsilon) - J(theta_i-epsilon)) / (2*epsilon). I've written a sample piece of code in Matlab to do just this, but without much luck -- the differences between the gradient calculated from the derivative and the gradient numerically found tend to be largish (>> 4 significant figures).If anyone can offer any suggestions, I would greatly appreciate the help (either in my calculation of the gradient or how I perform the check). Because I've simplified the code greatly to make it more readable, I haven't included a biases, and am no longer tying the weight matrices.First, I initialize the variables:numHidden = 200;numVisible = 784;low = -4*sqrt(6./(numHidden + numVisible));high = 4*sqrt(6./(numHidden + numVisible));encoder = low + (high-low)*rand(numVisible, numHidden);decoder = low + (high-low)*rand(numHidden, numVisible);Next, given some input image x, do feed-forward propagation:a = sigmoid(x*encoder);z = sigmoid(a*decoder); % (reconstruction of x)The loss function I'm using is the standard Σ(0.5*(z - x)^2)):% first calculate the error by finding the derivative of sum(0.5*(z-x).^2),% which is (f(h)-x)*f'(h), where z = f(h), h = a*decoder, and% f = sigmoid(x). However, since the derivative of the sigmoid is% sigmoid*(1 - sigmoid), we get:error_0 = (z - x).*z.*(1-z);% The gradient \Delta w_{ji} = error_j*a_igDecoder = error_0'*a;% not important, but included for completeness% do back-propagation one layer downerror_1 = (error_0*encoder).*a.*(1-a);gEncoder = error_1'*x;And finally, check that the gradient is correct (in this case, just do it for the decoder):epsilon = 10e-5;check = gDecoder(:); % the values we obtained abovefor i = 1:size(decoder(:), 1) % calculate J+ theta = decoder(:); % unroll theta(i) = theta(i) + epsilon; decoderp = reshape(theta, size(decoder)); % re-roll a = sigmoid(x*encoder); z = sigmoid(a*decoderp); Jp = sum(0.5*(z - x).^2); % calculate J- theta = decoder(:); theta(i) = theta(i) - epsilon; decoderp = reshape(theta, size(decoder)); a = sigmoid(x*encoder); z = sigmoid(a*decoderp); Jm = sum(0.5*(z - x).^2); grad_i = (Jp - Jm) / (2*epsilon); diff = abs(grad_i - check(i)); fprintf('%d: %f <=> %f: %f\n', i, grad_i, check(i), diff);endRunning this on the MNIST dataset (for the first entry) gives results such as:2: 0.093885 <=> 0.028398: 0.0654873: 0.066285 <=> 0.031096: 0.0351895: 0.053074 <=> 0.019839: 0.0332356: 0.108249 <=> 0.042407: 0.0658437: 0.091576 <=> 0.009014: 0.082562 解决方案 Do not sigmoid on both a and z. Just use it on z.a = x*encoder;z = sigmoid(a*decoderp); 这篇关于进行梯度下降时检查梯度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！