问题描述
我正在阅读下面的文章,遇到一些麻烦,了解负采样的概念.
http://arxiv.org/pdf/1402.3722v1.pdf
任何人都可以帮忙吗?
word2vec
的思想是最大化在向量中彼此靠拢(在彼此的上下文中)出现的单词的向量之间的相似度(点积).文字,并尽量减少不相似的文字.在要链接的论文的等式(3)中,暂时忽略幂.你有 v_c * v_w
-------------------
sum(v_c1 * v_w)
分子基本上是单词c
(上下文)和w
(目标)单词之间的相似性.分母计算所有其他上下文c1
和目标单词w
的相似度.最大化此比率可确保在文本中更靠近单词的单词比没有单词的单词具有更多相似的向量.但是,由于存在许多上下文c1
,因此计算速度可能非常慢.负采样是解决此问题的方法之一-只需随机选择几个上下文c1
即可.最终结果是,如果cat
出现在food
的上下文中,则与的向量相比,food
的向量与cat
的向量(按其点积度量)更相似.其他几个随机选择的单词(例如democracy
,greed
,Freddy
),而不是所有其他语言单词.这使得word2vec
的训练速度快得多.
I'm reading the paper below and I have some trouble , understanding the concept of negative sampling.
http://arxiv.org/pdf/1402.3722v1.pdf
Can anyone help , please?
The idea of word2vec
is to maximise the similarity (dot product) between the vectors for words which appear close together (in the context of each other) in text, and minimise the similarity of words that do not. In equation (3) of the paper you link to, ignore the exponentiation for a moment. You have
v_c * v_w
-------------------
sum(v_c1 * v_w)
The numerator is basically the similarity between words c
(the context) and w
(the target) word. The denominator computes the similarity of all other contexts c1
and the target word w
. Maximising this ratio ensures words that appear closer together in text have more similar vectors than words that do not. However, computing this can be very slow, because there are many contexts c1
. Negative sampling is one of the ways of addressing this problem- just select a couple of contexts c1
at random. The end result is that if cat
appears in the context of food
, then the vector of food
is more similar to the vector of cat
(as measures by their dot product) than the vectors of several other randomly chosen words (e.g. democracy
, greed
, Freddy
), instead of all other words in language. This makes word2vec
much much faster to train.
这篇关于word2vec:否定采样(以外行术语)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!