本文介绍了Gensim列车不更新权重的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个特定于领域的语料库,我正在尝试为其训练嵌入。因为我想全面掌握词汇,所以我添加了glove.6B.50d.txt中的单词向量。从这里添加向量后,我正在使用我拥有的语料库训练模型。

我正在尝试here中的解决方案,但单词嵌入似乎没有更新。

这是我到目前为止拥有的解决方案。

#read glove embeddings
glove_wv = KeyedVectors.load_word2vec_format(GLOVE_PATH, binary=False)

#initialize w2v model
model =  Word2Vec(vector_size=50, min_count=0, window=20, epochs=10, sg=1, workers=10, 
                      hs=1, ns_exponent=0.5, seed=42, sample=10**-2, shrink_windows=True)
model.build_vocab(sentences_tokenized)
training_examples_count = model.corpus_count

# add vocab from glove
model.build_vocab([list(glove_wv.key_to_index.keys())], update=True)
model.wv.vectors_lockf = np.zeros(len(model.wv)) # ALLOW UPDATE OF WEIGHTS FROM BACK PROP; 0 WILL SUPPRESS

# add glove embeddings
model.wv.intersect_word2vec_format(GLOVE_PATH,binary=False, lockf=1.0)

下面我正在训练模型并检查训练中明确出现的特定单词的单词嵌入

# train model
model.train(sentences_tokenized,total_examples=training_examples_count, epochs=model.epochs)

#CHECK IF EMBEDDING CHANGES FOR 'oyo'
print(model.wv.get_vector('oyo'))
print(glove_wv.get_vector('oyo'))

单词oyo的单词嵌入在训练前后是相同的。我哪里错了?

输入语料库sentences_tokenized包含几个包含单词oyo的句子。其中一句话--

'oyo global platform empowers entrepreneur small business hotel home providing full stack technology increase earnings eas operation bringing affordable trusted accommodation guest book instantly india largest budget hotel chain oyo room one preferred hotel booking destination vast majority student country hotel chain offer many benefit include early check in couple room id card flexibility oyo basically network budget hotel completely different famous hotel aggregator like goibibo yatra makemytrip partner zero two star hotel give makeover room bring customer hotel website mobile app'

推荐答案

您在这里即兴创作了很多潜在的错误或次优化。请特别注意:

这篇关于Gensim列车不更新权重的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 20:16