问题描述
我不确定应该如何使用gensim的Word2Vec的most_like方法.假设您要测试以下经验证的示例:男人代表国王,女人代表X ;找到X.我认为这是您可以使用此方法执行的操作,但是从结果中得知,我认为那是不正确的.
I am unsure how I should use the most_similar method of gensim's Word2Vec. Let's say you want to test the tried-and-true example of: man stands to king as woman stands to X; find X. I thought that is what you could do with this method, but from the results I am getting I don't think that is true.
文档内容如下:
此方法可计算出的简单均值之间的余弦相似度给定单词的投影权重向量以及每个单词的向量模型中的单词.该方法对应于词类比和word2vec原始实现中的距离脚本.
This method computes cosine similarity between a simple mean of the projection weight vectors of the given words and the vectors for each word in the model. The method corresponds to the word-analogy and distance scripts in the original word2vec implementation.
然后,我假设 most_like
采取正例和反例,并尝试在向量空间中找到尽可能靠近正向量且尽可能远的点.从负面的.正确吗?
I assume, then, that most_similar
takes the positive examples and negative examples, and tries to find points in the vector space that are as close as possible to the positive vectors and as far away as possible from the negative ones. Is that correct?
另外,有没有一种方法可以让我们将两个点之间的关系映射到另一个点并获得结果(请参见man-king woman-X示例)?
Additionally, is there a method that allows us to map the relation between two points to another point and get the result (cf. the man-king woman-X example)?
推荐答案
您可以在其源代码中准确查看 most_similar()
的作用:
You can view exactly what most_similar()
does in its source code:
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py#L485
并不是在向量空间中找到尽可能接近正向量并且尽可能远离负向量的点".相反,如原始word2vec论文中所述,它执行矢量算术:将正矢量相加,减去负模,然后从结果位置开始,列出最接近该角度的已知矢量.
It's not quite "find points in the vector space that are as close as possible to the positive vectors and as far away as possible from the negative ones". Rather, as described in the original word2vec papers, it performs vector arithmetic: adding the positive vectors, subtracting the negative, then from that resulting position, listing the known-vectors closest to that angle.
这足以解决 man:king ::女人::?
风格的类比,通过类似这样的调用:
That is sufficient to solve man : king :: woman :: ?
-style analogies, via a call like:
sims = wordvecs.most_similar(positive=['king', 'woman'],
negative=['man'])
(您可以这样认为:从'国王'向量开始,添加'女人'向量,减去'男人'向量,从结束处开始,报告最接近该点的排名词向量(同时保留3个查询向量中的任何一个).")
(You can think of this as, "start at 'king'-vector, add 'woman'-vector, subtract 'man'-vector, from where you wind up, report ranked word-vectors closest to that point (while leaving out any of the 3 query vectors).")
这篇关于了解gensim word2vec的most_like的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!