问题描述
我有一个在Python 2中计算出的gensim Word2Vec模型,如下所示:
I have a gensim Word2Vec model computed in Python 2 like that:
from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence
model = Word2Vec(LineSentence('enwiki.txt'), size=100,
window=5, min_count=5, workers=15)
model.save('w2v.model')
但是,我需要使用它Python 3.如果我尝试加载它,
However, I need to use it in Python 3. If I try to load it,
import gensim
from gensim.models import Word2Vec
model = Word2Vec.load('w2v.model')
它会导致错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf9 in position 0: ordinal not in range(128)
我认为Python2和Python3之间的编码方式存在差异。另外看起来gensim正在使用pickle来保存/加载模型。
I suppose the problem is in differences in encoding between Python2 and Python3. Also it seems like gensim is using pickle to save/load models.
有没有办法设置编码/ pickle选项,以便模型加载正确?或者可以使用一些外部工具来转换模型文件?
Is there a way to set encoding/pickle options so that the model loads properly? Or maybe use some external tool to convert the model file?
在Python 3中重新计算它不是一个选择:它需要太多时间。
Recomputing it in Python 3 is not an option: it takes way too much time.
推荐答案
这个确实看起来像某个地方的bug,如memoselyk所指出的,可以修改为在答案。
This indeed looks like a bug somewhere, as noted by memoselyk, and can be fixed in a way described in a comment to this answer.
所以你必须添加 encoding =' latin1'
调用 _pickle.loads
在 gensim.utils.unpickle
中,加载在Python 3中的模型,然后保存它,现在你可以恢复这个修复,并加载这个新的模型在未修改的gensim与Python 3。
So you have to add encoding='latin1'
to a call to _pickle.loads
in gensim.utils.unpickle
, load the model in Python 3, then save it, and now you can revert this fix and load this new model in unmodified gensim with Python 3.
这篇关于在Python 2中,在Python 3中计算加载gensim Word2Vec的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!