问题描述
在此页面,据说:
但是,查看它产生的训练数据集,X和Y对的内容似乎是可以互换的,因为这两对(X,Y):
However, looking at the training dataset it produces, the content of the X and Y pair seems to be interexchangeable, as those two pairs of (X, Y):
那么,如果最后还是同一回事,为什么还要在上下文和目标之间区分开来呢?
So, why distinguish that much between context and targets if it is the same thing in the end?
还要在对word2vec进行Udacity的深度学习课程练习,我想知道为什么他们似乎在这两种方法之间产生了如此大的差异:
Also, doing Udacity's Deep Learning course exercise on word2vec, I wonder why they seem to do the difference between those two approaches that much in this problem:
这不会产生相同的结果吗?
Would not this yields the same results?
推荐答案
以下是我对区别的过于简单和天真的理解:
Here is my oversimplified and rather naive understanding of the difference:
我们知道, CBOW 正在学习根据上下文预测单词.或者通过查看上下文来最大化目标单词的概率.碰巧这是个难得的问题.例如,给定上下文yesterday was a really [...] day
,CBOW模型将告诉您单词很可能是beautiful
或nice
.像delightful
这样的单词将很少受到模型的注意,因为它旨在预测最可能出现的单词.这个单词将在许多示例中使用更频繁的单词进行平滑处理.
As we know, CBOW is learning to predict the word by the context. Or maximize the probability of the target word by looking at the context. And this happens to be a problem for rare words. For example, given the context yesterday was a really [...] day
CBOW model will tell you that most probably the word is beautiful
or nice
. Words like delightful
will get much less attention of the model, because it is designed to predict the most probable word. This word will be smoothed over a lot of examples with more frequent words.
另一方面, skip-gram 模型用于预测上下文.给定单词delightful
,它必须理解它并告诉我们,上下文是yesterday was really [...] day
或其他一些相关上下文的可能性很大.使用 skip-gram 时,单词delightful
不会尝试与单词beautiful
竞争,而是将delightful+context
对视为新的观察值.
On the other hand, the skip-gram model is designed to predict the context. Given the word delightful
it must understand it and tell us that there is a huge probability that the context is yesterday was really [...] day
, or some other relevant context. With skip-gram the word delightful
will not try to compete with the word beautiful
but instead, delightful+context
pairs will be treated as new observations.
更新
感谢@ 0xF共享本文
跳过语法:对于少量的训练数据,效果很好,甚至可以代表稀有单词或短语.
Skip-gram: works well with small amount of the training data, represents well even rare words or phrases.
CBOW:的训练速度比跳跃语法快几倍,对于常见单词的准确度稍高
CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words
在此处中找到该主题的又一个补充:
One more addition to the subject is found here:
这篇关于CBOW诉skip-gram:为什么要反转上下文和目标词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!