assignment1SVM的一些经验

def svm_loss_vectorized(W, X, y, reg):

  """

  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.

  """

  loss = 0.0e0

  dW = np.zeros(W.shape,dtype='float64') # initialize the gradient as zero

  #############################################################################

  # TODO:                                                                     #

  # Implement a vectorized version of the structured SVM loss, storing the    #

  # result in loss.                                                           #

  #############################################################################

  pass

  #############################################################################

  #                             END OF YOUR CODE                              #

  #############################################################################

  num_train = X.shape[0]

  score = np.dot(X, W)

  loss_matrix = np.maximum(0, score - score[np.arange(num_train), np.array(y)].reshape(-1, 1) + 1)

  loss_matrix[np.arange(num_train), np.array(y)] = 0

  loss = np.sum(loss_matrix)

  loss /= num_train

  loss += 0.5 * reg * np.sum(W * W)

  #############################################################################

  # TODO:                                                                     #

  # Implement a vectorized version of the gradient for the structured SVM     #

  # loss, storing the result in dW.                                           #

  #                                                                           #

  # Hint: Instead of computing the gradient from scratch, it may be easier    #

  # to reuse some of the intermediate values that you used to compute the     #

  # loss.                                                                     #

  #############################################################################

  num_classes = W.shape[1]

  coeff_mat = np.zeros((num_train, num_classes))

  coeff_mat[loss_matrix > 0] = 1

  coeff_mat[range(num_train), list(y)] = 0

  coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)

  dW = (X.T).dot(coeff_mat)

  dW /= num_train

  dW += reg * W

  #############################################################################

  #                             END OF YOUR CODE                              #

  #############################################################################

  return loss, dW

这里面，有一句很难理解：

  loss_matrix = np.maximum(0, score - score[np.arange(num_train), np.array(y)].reshape(-1, 1) + 1)
当时看了很久，后来想通了，我们拆开来看，就不会很难了。

score[np.arange(num_train), np.array(y)]是从分数中，把正确的分数提取出来。下图中，那个小红框，就表示当前正确的分类对应的分数。提取出来之后，就是N*1维的矩阵

score - score[np.arange(num_train), np.array(y)].reshape(-1, 1)这个减法虽然维度不匹配，但是有boardcasting技术，后面的矩阵会自动列复制到维度N*C

assignment1SVM的一些经验-LMLPHP

  num_classes = W.shape[1]

  coeff_mat = np.zeros((num_train, num_classes))

  coeff_mat[loss_matrix > 0] = 1

  coeff_mat[range(num_train), list(y)] = 0

  coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)

  dW = (X.T).dot(coeff_mat)

  dW /= num_train

  dW += reg * W

  dW = (X.T).dot(coeff_mat) 这里dW 的计算，使用向量计算。用一个取值的coeff_mat矩阵来确定取哪些x加入。看懂循环是如何操作的，就明白了这个这里取巧的从X.T来实现循环，时间倍数16倍。

assignment1SVM的一些经验-LMLPHP

中间有几次，发现loss老是益处报错，后来才发现应该是learning rate 太大了，把-5改成-6，就可以了。原因是这里没有学习速率衰减优化策略