nlp - Doc2Vec.infer_vector每次在经过特殊训练的模型上始终给出不同的结果

我正在尝试遵循此处提到的Doc2Vec Gensim官方教程-https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb

我修改了第10行中的代码，以确定给定查询的最佳匹配文档，并且每次运行时，我都会得到一个完全不同的结果集。我在笔记本的第10行的新代码是:
inferred_vector = model.infer_vector(['only', 'you', 'can', 'prevent', 'forest', 'fires'])sims = model.docvecs.most_similar([inferred_vector], topn=len(model.docvecs))rank = [docid for docid, sim in sims]print(rank)
每次运行这段代码时，都会得到与此查询匹配的不同文档集:“只有您才能防止森林大火”。区别是明显的，只是看起来不匹配。

Doc2Vec是否不适合查询和信息提取？还是有错误？

最佳答案

查看代码，在infer_vector中，您使用的是不确定的算法部分。单词向量的初始化是确定性的-请参见seeded_vector的代码，但是当我们进一步看时，即单词的随机采样，负采样(每次迭代仅更新单词向量的样本)可能会导致不确定性输出(感谢@gojomo )。

    def seeded_vector(self, seed_string):
        """Create one 'random' vector (but deterministic by seed_string)"""
        # Note: built-in hash() may vary by Python version or even (in Py3.x) per launch
        once = random.RandomState(self.hashfxn(seed_string) & 0xffffffff)
        return (once.rand(self.vector_size) - 0.5) / self.vector_size

关于nlp - Doc2Vec.infer_vector每次在经过特殊训练的模型上始终给出不同的结果，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/48362530/

Doc2vec

nlp - Doc2Vec.infer_vector每次在经过特殊训练的模型上始终给出不同的结果