本文介绍了如何连接词向量以形成句子向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一些文章中(托马斯·米科洛夫(Tomas Mikolov ...))了解到,形成句子向量的一种更好的方法是连接词向量.

I have learned in some essays (Tomas Mikolov...) that a better way of forming the vector for a sentence is to concatenate the word-vector.

但是由于我对数学的笨拙,我仍然不确定细节.

but due to my clumsy in mathematics, I am still not sure about the details.

例如

假设单词向量的维数为m;一个句子有n个单词.

supposing that the dimension of word vector is m; and that a sentence has n words.

串联操作的正确结果是什么?

what will be the correct result of concatenating operation?

是1 x m * n的行向量吗?或m x n的矩阵?

is it a row vector of 1 x m*n ? or a matrix of m x n ?

请告知

谢谢

推荐答案

至少有3种常见的方式来组合嵌入向量. (a)求和,(b)求和&平均或(c)串联.因此,在您的情况下,通过串联可以得到一个1 x m*a向量,其中a是句子数.在其他情况下,向量长度保持不变.请参见gensim.models.doc2vec.Doc2Vecdm_concatdm_mean-它允许您使用这三个选项中的任何一个[1,2].

There are at least three common ways to combine embedding vectors; (a) summing, (b) summing & averaging or (c) concatenating. So in your case, with concatenating, that would give you a 1 x m*a vector, where a is the number of sentences. In the other cases, the vector length stays the same. See gensim.models.doc2vec.Doc2Vec, dm_concat and dm_mean - it allows you to use any of those three options [1,2].

[1] http://radimrehurek.com/gensim /models/doc2vec.html#gensim.models.doc2vec.LabeledLineSentence

[2] https://github.com/piskvorky /gensim/blob/develop/gensim/models/doc2vec.py

这篇关于如何连接词向量以形成句子向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-24 16:19