问题描述
如何将预先训练的词嵌入加载到Keras Embedding
层中?
How do I load a pre-trained word-embedding into a Keras Embedding
layer?
我从 https://nlp下载了glove.6B.50d.txt
(glove.6B.zip文件. stanford.edu/projects/glove/),但我不确定如何将其添加到Keras嵌入层中.请参阅: https://keras.io/layers/embeddings/
I downloaded the glove.6B.50d.txt
(glove.6B.zip file from https://nlp.stanford.edu/projects/glove/) and I'm not sure how to add it to a Keras Embedding layer. See: https://keras.io/layers/embeddings/
推荐答案
您将需要将embeddingMatrix传递给Embedding
层,如下所示:
You will need to pass an embeddingMatrix to the Embedding
layer as follows:
Embedding(vocabLen, embDim, weights=[embeddingMatrix], trainable=isTrainable)
-
vocabLen
:词汇表中的令牌数量 -
embDim
:嵌入向量尺寸(在您的示例中为50) -
embeddingMatrix
:从Gloves.6B.50d.txt构建的嵌入矩阵 -
isTrainable
:您是希望嵌入是可训练的还是冻结图层
vocabLen
: number of tokens in your vocabularyembDim
: embedding vectors dimension (50 in your example)embeddingMatrix
: embedding matrix built from glove.6B.50d.txtisTrainable
: whether you want the embeddings to be trainable or froze the layer
glove.6B.50d.txt
是由空格分隔的值的列表:单词标记+(50)嵌入值.例如the 0.418 0.24968 -0.41242 ...
The glove.6B.50d.txt
is a list of whitespace-separated values: word token + (50) embedding values. e.g. the 0.418 0.24968 -0.41242 ...
要从手套文件创建pretrainedEmbeddingLayer
,请执行以下操作:
To create a pretrainedEmbeddingLayer
from a Glove file:
# Prepare Glove File
def readGloveFile(gloveFile):
with open(gloveFile, 'r') as f:
wordToGlove = {} # map from a token (word) to a Glove embedding vector
wordToIndex = {} # map from a token to an index
indexToWord = {} # map from an index to a token
for line in f:
record = line.strip().split()
token = record[0] # take the token (word) from the text line
wordToGlove[token] = np.array(record[1:], dtype=np.float64) # associate the Glove embedding vector to a that token (word)
tokens = sorted(wordToGlove.keys())
for idx, tok in enumerate(tokens):
kerasIdx = idx + 1 # 0 is reserved for masking in Keras (see above)
wordToIndex[tok] = kerasIdx # associate an index to a token (word)
indexToWord[kerasIdx] = tok # associate a word to a token (word). Note: inverse of dictionary above
return wordToIndex, indexToWord, wordToGlove
# Create Pretrained Keras Embedding Layer
def createPretrainedEmbeddingLayer(wordToGlove, wordToIndex, isTrainable):
vocabLen = len(wordToIndex) + 1 # adding 1 to account for masking
embDim = next(iter(wordToGlove.values())).shape[0] # works with any glove dimensions (e.g. 50)
embeddingMatrix = np.zeros((vocabLen, embDim)) # initialize with zeros
for word, index in wordToIndex.items():
embeddingMatrix[index, :] = wordToGlove[word] # create embedding: word index to Glove word embedding
embeddingLayer = Embedding(vocabLen, embDim, weights=[embeddingMatrix], trainable=isTrainable)
return embeddingLayer
# usage
wordToIndex, indexToWord, wordToGlove = readGloveFile("/path/to/glove.6B.50d.txt")
pretrainedEmbeddingLayer = createPretrainedEmbeddingLayer(wordToGlove, wordToIndex, False)
model = Sequential()
model.add(pretrainedEmbeddingLayer)
...
这篇关于如何从预训练的词嵌入数据集创建Keras嵌入层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!