本文介绍了word2vec:否定采样(以外行术语)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读下面的文章,遇到一些麻烦,了解负采样的概念.

http://arxiv.org/pdf/1402.3722v1.pdf

任何人都可以帮忙吗?

word2vec的思想是最大化在向量中彼此靠拢(在彼此的上下文中)出现的单词的向量之间的相似度(点积).文字,并尽量减少不相似的文字.在要链接的论文的等式(3)中,暂时忽略幂.你有

      v_c * v_w
 -------------------
   sum(v_c1 * v_w)

分子基本上是单词c(上下文)和w(目标)单词之间的相似性.分母计算所有其他上下文c1和目标单词w的相似度.最大化此比率可确保在文本中更靠近单词的单词比没有单词的单词具有更多相似的向量.但是,由于存在许多上下文c1,因此计算速度可能非常慢.负采样是解决此问题的方法之一-只需随机选择几个上下文c1即可.最终结果是,如果cat出现在food的上下文中,则与的向量相比,food的向量与cat的向量(按其点积度量)更相似.其他几个随机选择的单词(例如democracygreedFreddy),而不是所有其他语言单词.这使得word2vec的训练速度快得多.

I'm reading the paper below and I have some trouble , understanding the concept of negative sampling.

http://arxiv.org/pdf/1402.3722v1.pdf

Can anyone help , please?

解决方案

The idea of word2vec is to maximise the similarity (dot product) between the vectors for words which appear close together (in the context of each other) in text, and minimise the similarity of words that do not. In equation (3) of the paper you link to, ignore the exponentiation for a moment. You have

      v_c * v_w
 -------------------
   sum(v_c1 * v_w)

The numerator is basically the similarity between words c (the context) and w (the target) word. The denominator computes the similarity of all other contexts c1 and the target word w. Maximising this ratio ensures words that appear closer together in text have more similar vectors than words that do not. However, computing this can be very slow, because there are many contexts c1. Negative sampling is one of the ways of addressing this problem- just select a couple of contexts c1 at random. The end result is that if cat appears in the context of food, then the vector of food is more similar to the vector of cat (as measures by their dot product) than the vectors of several other randomly chosen words (e.g. democracy, greed, Freddy), instead of all other words in language. This makes word2vec much much faster to train.

这篇关于word2vec:否定采样(以外行术语)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 05:46