CS231n:如何计算Softmax损失函数的梯度?

上面的Python实现是:num_classes = W.shape[0]num_train = X.shape[1]for i in range(num_train): for j in range(num_classes): p = np.exp(f_i[j])/sum_i dW[j, :] += (p-(j == y[i])) * X[:, i]任何人都可以解释以上代码段的工作方式吗?下面还包括softmax的详细实现.def softmax_loss_naive(W, X, y, reg): """ Softmax loss function, naive implementation (with loops) Inputs: - W: C x D array of weights - X: D x N array of data. Data are D-dimensional columns - y: 1-dimensional array of length N with labels 0...K-1, for K classes - reg: (float) regularization strength Returns: a tuple of: - loss as single float - gradient with respect to weights W, an array of same size as W """ # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W) ############################################################################# # Compute the softmax loss and its gradient using explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# # Get shapes num_classes = W.shape[0] num_train = X.shape[1] for i in range(num_train): # Compute vector of scores f_i = W.dot(X[:, i]) # in R^{num_classes} # Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax log_c = np.max(f_i) f_i -= log_c # Compute loss (and add to it, divided later) # L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j} sum_i = 0.0 for f_i_j in f_i: sum_i += np.exp(f_i_j) loss += -f_i[y[i]] + np.log(sum_i) # Compute gradient # dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )] # Here we are computing the contribution to the inner sum for a given i. for j in range(num_classes): p = np.exp(f_i[j])/sum_i dW[j, :] += (p-(j == y[i])) * X[:, i] # Compute average loss /= num_train dW /= num_train # Regularization loss += 0.5 * reg * np.sum(W * W) dW += reg*W return loss, dW解决方案不确定是否有帮助，但是: 实际上是指标函数，如此处.这将在代码中形成表达式(j == y[i]).此外，损失相对于权重的梯度为:其中是代码中X[:,i]的来源.I am watching some videos for Stanford CS231: Convolutional Neural Networks for Visual Recognition but do not quite understand how to calculate analytical gradient for softmax loss function using numpy.From this stackexchange answer, softmax gradient is calculated as:Python implementation for above is:num_classes = W.shape[0]num_train = X.shape[1]for i in range(num_train): for j in range(num_classes): p = np.exp(f_i[j])/sum_i dW[j, :] += (p-(j == y[i])) * X[:, i]Could anyone explain how the above snippet work? Detailed implementation for softmax is also included below.def softmax_loss_naive(W, X, y, reg): """ Softmax loss function, naive implementation (with loops) Inputs: - W: C x D array of weights - X: D x N array of data. Data are D-dimensional columns - y: 1-dimensional array of length N with labels 0...K-1, for K classes - reg: (float) regularization strength Returns: a tuple of: - loss as single float - gradient with respect to weights W, an array of same size as W """ # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W) ############################################################################# # Compute the softmax loss and its gradient using explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# # Get shapes num_classes = W.shape[0] num_train = X.shape[1] for i in range(num_train): # Compute vector of scores f_i = W.dot(X[:, i]) # in R^{num_classes} # Normalization trick to avoid numerical instability, per http://cs231n.github.io/linear-classify/#softmax log_c = np.max(f_i) f_i -= log_c # Compute loss (and add to it, divided later) # L_i = - f(x_i)_{y_i} + log \sum_j e^{f(x_i)_j} sum_i = 0.0 for f_i_j in f_i: sum_i += np.exp(f_i_j) loss += -f_i[y[i]] + np.log(sum_i) # Compute gradient # dw_j = 1/num_train * \sum_i[x_i * (p(y_i = j)-Ind{y_i = j} )] # Here we are computing the contribution to the inner sum for a given i. for j in range(num_classes): p = np.exp(f_i[j])/sum_i dW[j, :] += (p-(j == y[i])) * X[:, i] # Compute average loss /= num_train dW /= num_train # Regularization loss += 0.5 * reg * np.sum(W * W) dW += reg*W return loss, dW 解决方案 Not sure if this helps, but: is really the indicator function , as described here. This forms the expression (j == y[i]) in the code.Also, the gradient of the loss with respect to the weights is:wherewhich is the origin of the X[:,i] in the code. 这篇关于CS231n:如何计算Softmax损失函数的梯度?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！