python - 在 tensorflow 中了解 `tf.nn.nce_loss()`

我正在尝试了解Tensorflow中的NCE损失函数。 NCE丢失用于word2vec任务，例如:

# Look up embeddings for inputs.
embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

# Compute the average NCE loss for the batch.
# tf.nce_loss automatically draws a new sample of the negative labels each
# time we evaluate the loss.
loss = tf.reduce_mean(
    tf.nn.nce_loss(weights=nce_weights,
                   biases=nce_biases,
                   labels=train_labels,
                   inputs=embed,
                   num_sampled=num_sampled,
                   num_classes=vocabulary_size))

更多详细信息，请引用Tensorflow word2vec_basic.py

NCE函数中的输入和输出矩阵是什么？

在word2vec模型中，我们对构建单词表示感兴趣。在训练过程中，给定滑动窗口，每个单词将具有两个嵌入:1)当单词是中心单词时； 2)当单词是上下文单词时。这两个嵌入分别称为输入向量和输出向量。 (more explanations of input and output matrices)

我认为输入矩阵是embeddings，输出矩阵是nce_weights。这样对吗？

最后的嵌入是什么？

根据也与nce有关的s0urcer的post，它说最终的嵌入矩阵只是输入矩阵。而some others saying是final_embedding=input_matrix+output_matrix。哪个是正确的/更常见的？

最佳答案

让我们看一下 word2vec 示例(examples/tutorials/word2vec)中的相关代码。

embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

这两行创建嵌入表示。 embeddings是一个矩阵，其中每一行代表一个单词向量。 embedding_lookup是获取与train_inputs对应的向量的快速方法。在word2vec示例中，train_inputs由一些int32数字组成，代表目标单词的id。基本上，可以将其放置在隐藏图层要素中。

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

这两行创建参数。它们将在培训期间由优化程序更新。我们可以使用tf.matmul(embed, tf.transpose(nce_weights)) + nce_biases来获得最终的输出分数。换句话说，可以将分类中的最后一个内部产品层替换为。

loss = tf.reduce_mean(
      tf.nn.nce_loss(weights=nce_weights,     # [vocab_size, embed_size]
                   biases=nce_biases,         # [vocab_size]
                   labels=train_labels,       # [bs, 1]
                   inputs=embed,              # [bs, embed_size]
                   num_sampled=num_sampled,
                   num_classes=vocabulary_size))

这些行创建nce loss， @garej 给出了很好的解释。 num_sampled是nce算法中否定采样的数量。

为了说明nce的用法，我们可以通过以下两个步骤将其应用在 mnist 示例(examples/tutorials/mnist/mnist_deep.py)中:
1.用隐藏层输出替换embed。隐藏层的维数为1024，num_output为10。 num_sampled的最小值为1。切记删除deepnn()中的最后一个内积层。

y_conv, keep_prob = deepnn(x)

num_sampled = 1
vocabulary_size = 10
embedding_size = 1024
with tf.device('/cpu:0'):
  embed = y_conv
  # Construct the variables for the NCE loss
  nce_weights = tf.Variable(
      tf.truncated_normal([vocabulary_size, embedding_size],
                          stddev=1.0 / math.sqrt(embedding_size)))
  nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

2.创建损失并计算输出。计算输出后，我们可以使用它来计算精度。请注意，此处的标签不是softmax中使用的单热向量。标签是训练样本的原始标签。

loss = tf.reduce_mean(
    tf.nn.nce_loss(weights=nce_weights,
                   biases=nce_biases,
                   labels=y_idx,
                   inputs=embed,
                   num_sampled=num_sampled,
                   num_classes=vocabulary_size))

output = tf.matmul(y_conv, tf.transpose(nce_weights)) + nce_biases
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))

当我们设置num_sampled=1时，val精度将在98.8%附近结束。而且，如果我们设置num_sampled=9，我们可以获得与softmax训练的val精度几乎相同的val精度。但是请注意nce与softmax不同。
可以通过here找到通过mnist训练nce的完整代码。希望对您有所帮助。

关于python - 在 tensorflow 中了解 `tf.nn.nce_loss()`，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/41475180/