本文介绍了CBOW诉skip-gram:为什么要反转上下文和目标词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

页面,据说:

但是,查看它产生的训练数据集,X和Y对的内容似乎是可以互换的,因为这两对(X,Y):

However, looking at the training dataset it produces, the content of the X and Y pair seems to be interexchangeable, as those two pairs of (X, Y):

那么,如果最后还是同一回事,为什么还要在上下文和目标之间区分开来呢?

So, why distinguish that much between context and targets if it is the same thing in the end?

还要在对word2vec进行Udacity的深度学习课程练习,我想知道为什么他们似乎在这两种方法之间产生了如此大的差异:

Also, doing Udacity's Deep Learning course exercise on word2vec, I wonder why they seem to do the difference between those two approaches that much in this problem:

这不会产生相同的结果吗?

Would not this yields the same results?

推荐答案

以下是我对区别的过于简单和天真的理解:

Here is my oversimplified and rather naive understanding of the difference:

我们知道, CBOW 正在学习根据上下文预测单词.或者通过查看上下文来最大化目标单词的概率.碰巧这是个难得的问题.例如,给定上下文yesterday was a really [...] day,CBOW模型将告诉您单词很可能是beautifulnice.像delightful这样的单词将很少受到模型的注意,因为它旨在预测最可能出现的单词.这个单词将在许多示例中使用更频繁的单词进行平滑处理.

As we know, CBOW is learning to predict the word by the context. Or maximize the probability of the target word by looking at the context. And this happens to be a problem for rare words. For example, given the context yesterday was a really [...] day CBOW model will tell you that most probably the word is beautiful or nice. Words like delightful will get much less attention of the model, because it is designed to predict the most probable word. This word will be smoothed over a lot of examples with more frequent words.

另一方面, skip-gram 模型用于预测上下文.给定单词delightful,它必须理解它并告诉我们,上下文是yesterday was really [...] day或其他一些相关上下文的可能性很大.使用 skip-gram 时,单词delightful不会尝试与单词beautiful竞争,而是将delightful+context对视为新的观察值.

On the other hand, the skip-gram model is designed to predict the context. Given the word delightful it must understand it and tell us that there is a huge probability that the context is yesterday was really [...] day, or some other relevant context. With skip-gram the word delightful will not try to compete with the word beautiful but instead, delightful+context pairs will be treated as new observations.

更新

感谢@ 0xF共享本文

跳过语法:对于少量的训练数据,效果很好,甚至可以代表稀有单词或短语.

Skip-gram: works well with small amount of the training data, represents well even rare words or phrases.

CBOW:的训练速度比跳跃语法快几倍,对于常见单词的准确度稍高

CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words

此处中找到该主题的又一个补充:

One more addition to the subject is found here:

这篇关于CBOW诉skip-gram:为什么要反转上下文和目标词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 03:31