本文介绍了使用Gensim Doc2Vec寻找'Doctag'和'infer_vector'之间的距离?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Gensim的Doc2Vec,我如何找到 Doctag infer_vector()之间的距离?

Using Gensim's Doc2Vec how would I find the distance between a Doctag and an infer_vector()?

非常感谢

推荐答案

Doctag 是doc-vector的键的内部名称. infer_vector()操作的结果是一个向量.因此,正如您从字面上提出的那样,它们是不可比的.

Doctag is the internal name for the keys to doc-vectors. The result of an infer_vector() operation is a vector. So as you've literally asked, these aren't comparable.

您可以通过在训练期间通过 model.docvecs [doctag] 提供的doc-tag密钥,向模型询问已知的doc-vector.这可以与 infer_vector()调用的结果相提并论.

You could ask a model for a known doc-vector, by its doc-tag key that was supplied during training, via model.docvecs[doctag]. That would be comparable to the result of an infer_vector() call.

手头有两个向量,您可以使用 scipy 例程来计算各种距离.例如:

With two vectors in hand, you can use scipy routines to calculate various kinds of distance. For example:

import scipy.spatial.distance.cosine as cosine_distance
vec_by_doctag = model.docvecs["doc0007"]
vec_by_inference = model.infer_vector(['a', 'cat', 'was', 'in', 'a', 'hat'])
dist = cosine_distance(vec_by_doctag, vec_by_inference)

您还可以查看gensim的 Doc2VecKeyedVectors 如何在模型中通过其 similarity()已知(通过其doctag键名)已知的矢量之间的相似度/距离.和 distance()函数,位于:

You can also look at how gensim's Doc2VecKeyedVectors does similarity/distance between vectors that are known (by their doctag key names) inside a model, in its similarity() and distance() functions, at:

https://github.com/RaRe-Technologies/gensim/blob/ca0dcaa1eca8b1764f6456adac5719309e0d8e6d/gensim/models/keyedvectors.py#L1701

https://github.com/RaRe-Technologies/gensim/blob/ca0dcaa1eca8b1764f6456adac5719309e0d8e6d/gensim/models/keyedvectors.py#L1743

这篇关于使用Gensim Doc2Vec寻找'Doctag'和'infer_vector'之间的距离?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 03:14