我有Backprop的这些更新,请让我知道dx部分哪里有问题。在计算图中,我正在使用X,sample_mean和sample_var。谢谢你的帮助

(x, norm, sample_mean, sample_var, gamma, eps) = cache
dbeta = np.sum(dout, axis = 0)
dgamma = np.sum(dout * norm, axis = 0)
dxminus = dout * gamma / np.sqrt(sample_var + eps)
dmean = - np.sum(dxminus, axis = 0)
dxmean = np.full(x.shape, 1.0/x.shape[0]) * dmean
dvar = np.sum(dout * gamma * (x - sample_mean), axis = 0)
dxvar = dvar * (-1 / x.shape[0]) * np.power(x, -1.5) * (x - sample_mean)
dx = dxminus + dxmean + dxvar


BatchNorm Computational Graph I used for deriving

最佳答案

您的dx公式看起来不正确,因为x节点将接收来自其他两个节点的反向消息(一个是总和,另一个是均值),看起来您只在计算一个组件:

machine-learning - Batchnorm中错误的Backprop更新-LMLPHP

所以它应该看起来像这样:

dx1 = dxmu1 + dxmu2
dmu = -1 * np.sum(dxmu1+dxmu2, axis=0)
dx2 = 1. /N * np.ones((N,D)) * dmu
dx = dx1 + dx2


图片来自this wonderful post。您也可以在此处找到完整的代码。

关于machine-learning - Batchnorm中错误的Backprop更新,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50136187/

10-12 21:36