BERTforTokenClassification

BERTforTokenClassification

本文介绍了理解 Transformers 库中 BERTforTokenClassification 类的输出时的困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是transformers pytorch library文档中给出的例子

It is the example given in the documentation of transformers pytorch library

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased',
                      output_hidden_states=True, output_attentions=True)

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute",
                         add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores, hidden_states,attentions = outputs

这里的 hidden_​​states 是一个长度为 13 的元组,包含模型在每层输出处的隐藏状态以及初始嵌入输出.我想知道,hidden_​​states[0] 还是 hidden_​​states[12] 代表最终的隐藏状态向量?

Here hidden_states is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?

推荐答案

如果查看源代码,特别是 BertEncoder,你可以看到返回的状态被初始化为一个空元组,然后简单地附加在每次迭代每一层.

If you check the source code, specifically BertEncoder, you can see that the returned states are initialized as an empty tuple and then simply appended per iteration of each layer.

最后一层作为最后一个元素在此循环之后,参见此处,因此我们可以安全地假设 hidden_​​states[12] 是最终向量.

The final layer is appended as the last element after this loop, see here, so we can safely assume that hidden_states[12] is the final vectors.

这篇关于理解 Transformers 库中 BERTforTokenClassification 类的输出时的困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 18:20