本文介绍了如何将句子加载到Python gensim中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用Python中gensim
自然语言处理库中的 word2vec
模块.
I am trying to use the word2vec
module from gensim
natural language processing library in Python.
医生说要初始化模型:
from gensim.models import word2vec
model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)
gensim
对输入句子期望什么格式?我有原始文本
What format does gensim
expect for the input sentences? I have raw text
"the quick brown fox jumps over the lazy dogs"
"Then a cop quizzed Mick Jagger's ex-wives briefly."
etc.
我需要将哪些其他处理发布到word2fec
?
What additional processing do I need to post into word2fec
?
更新:这是我尝试过的.当它加载句子时,我什么也没得到.
UPDATE: Here is what I have tried. When it loads the sentences, I get nothing.
>>> sentences = ['the quick brown fox jumps over the lazy dogs',
"Then a cop quizzed Mick Jagger's ex-wives briefly."]
>>> x = word2vec.Word2Vec()
>>> x.build_vocab([s.encode('utf-8').split( ) for s in sentences])
>>> x.vocab
{}
推荐答案
utf-8
个句子的列表.您还可以从磁盘流式传输数据.
A list of utf-8
sentences. You can also stream the data from the disk.
确保它是utf-8
,然后将其分割:
Make sure it's utf-8
, and split it:
sentences = [ "the quick brown fox jumps over the lazy dogs",
"Then a cop quizzed Mick Jagger's ex-wives briefly." ]
word2vec.Word2Vec([s.encode('utf-8').split() for s in sentences], size=100, window=5, min_count=5, workers=4)
这篇关于如何将句子加载到Python gensim中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!