问题描述
我在 PyTorch 中有一个多任务编码器/解码器模型,输入端有一个(可训练的)torch.nn.Embedding
嵌入层.
I have an multi-task encoder/decoder model in PyTorch with a (trainable) torch.nn.Embedding
embedding layer at the input.
在一项特定任务中,我想对模型进行自我监督预训练(以重新构建屏蔽输入数据)并将其用于推理(以填补数据空白).
In one particular task, I'd like to pre-train the model self-supervised (to re-construct masked input data) and use it for inference (to fill in gaps in data).
我想对于训练时间,我可以将损失作为输入嵌入和输出嵌入之间的距离来衡量……但是对于推理,我如何反转 Embedding
以重建正确的类别/标记输出对应于?我看不到例如最近的"Embedding 类上的函数...
I guess for training time I can just measure loss as the distance between the input embedding and the output embedding... But for inference, how do I invert an Embedding
to reconstruct the proper category/token the output corresponds to? I can't see e.g. a "nearest" function on the Embedding class...
推荐答案
你可以很容易地做到:
import torch
embeddings = torch.nn.Embedding(1000, 100)
my_sample = torch.randn(1, 100)
distance = torch.norm(embeddings.weight.data - my_sample, dim=1)
nearest = torch.argmin(distance)
假设您有 1000
个具有 100
维的标记,这将返回基于欧几里德距离的最近嵌入.您还可以以类似方式使用其他指标.
Assuming you have 1000
tokens with 100
dimensionality this would return nearest embedding based on euclidean distance. You could also use other metrics in similar manner.
这篇关于如何反转 PyTorch 嵌入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!