本文介绍了如何在从gensim创建的word2vec上运行tsne?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想可视化从gensim库创建的word2vec.我尝试了sklearn,但似乎需要安装开发人员版本才能获得它.我尝试安装开发人员版本,但是在我的计算机上不起作用.是否可以修改此代码以可视化word2vec模型?

I want to visualize a word2vec created from gensim library. I tried sklearn but it seems I need to install a developer version to get it. I tried installing the developer version but that is not working on my machine . Is it possible to modify this code to visualize a word2vec model ?

tsne_python

推荐答案

您不需要scikit-learn的开发人员版本-只需通过 pip conda 安装scikit-learn .

You don't need a developer version of scikit-learn - just install scikit-learn the usual way via pip or conda.

要访问由word2vec创建的单词向量,只需使用单词字典作为模型索引:

To access the word vectors created by word2vec simply use the word dictionary as index into the model:

X = model[model.wv.vocab]

以下是一个简单但完整的代码示例,该示例加载了一些新闻组数据,应用了非常基本的数据准备(清理和分解句子),训练了word2vec模型,使用t-SNE缩小了尺寸,并可视化了输出.

Following is a simple but complete code example which loads some newsgroup data, applies very basic data preparation (cleaning and breaking up sentences), trains a word2vec model, reduces the dimensions with t-SNE, and visualizes the output.

from gensim.models.word2vec import Word2Vec
from sklearn.manifold import TSNE
from sklearn.datasets import fetch_20newsgroups
import re
import matplotlib.pyplot as plt

# download example data ( may take a while)
train = fetch_20newsgroups()

def clean(text):
    """Remove posting header, split by sentences and words, keep only letters"""
    lines = re.split('[?!.:]\s', re.sub('^.*Lines: \d+', '', re.sub('\n', ' ', text)))
    return [re.sub('[^a-zA-Z]', ' ', line).lower().split() for line in lines]

sentences = [line for text in train.data for line in clean(text)]

model = Word2Vec(sentences, workers=4, size=100, min_count=50, window=10, sample=1e-3)

print (model.wv.most_similar('memory'))

X = model.wv[model.wv.vocab]

tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1])
plt.show()

这篇关于如何在从gensim创建的word2vec上运行tsne?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-13 13:40