冻结一些词向量并更新其他词向量

冻结一些词向量并更新其他词向量

本文介绍了Gensim Word2vec 冻结一些词向量并更新其他词向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于 word2vec 和 gensim,假设你已经在一个大语料库上训练了一个模型,你想用新句子中的新词更新它,但不更新已经有向量的词.是否可以在调用 model.train 时冻结某些单词的向量并仅更新某些选定的单词(如新单词)?或者也许有技巧可以做到这一点?谢谢.

Regarding word2vec with gensim,Suppose you already trained a model on a big corpus, and you want to update it with new words from new sentences, but not update the words which already have a vector.Is it possible to freeze the vectors of some words and update only some chosen words (like the new words) when calling model.train ?Or maybe is there a trick to do it ?Thanks.

推荐答案

有!但这是一个几乎没有文档的实验性功能——您需要阅读源代码才能完全理解它,并直接改变您的模型以使用它.

There is! But it's an experimental feature with little documentation – you'd need to read the source to fully understand it, and directly mutate your model to make use of it.

查看 word2vec.py 源代码以获取以 _lockf 结尾的属性——特别是在最新的代码中,一个名为 vectors_lockf 的代码.它是一种允许、削弱或停止某些单词训练的掩码.对于每个单词,如果它的值为 1.0,则应用正常的完全反向传播更新.任何较低的值都会削弱更新——因此 0.0 会冻结一个单词以防止更新.(潜在更新仍在计算中——因此没有净加速——它只是在最终应用到特定冻结词之前乘以 0.0.)

Look through the word2vec.py source for properties ending _lockf – specifically in the latest code, one named vectors_lockf. It's a sort of mask which either allows, weakens, or stops training of certain words. For each word, if it's value is 1.0, normal full backpropagated updates are applied. Any lower value weakens the update – so 0.0 freezes a word against updates. (The potential update is still calculated – so there's no net speedup – it's just multiplied-by-0.0 before final application to particular frozen words.)

这篇关于Gensim Word2vec 冻结一些词向量并更新其他词向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-24 16:19