问题描述
这是transformers pytorch library文档中给出的例子
It is the example given in the documentation of transformers pytorch library
from transformers import BertTokenizer, BertForTokenClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased',
output_hidden_states=True, output_attentions=True)
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute",
add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, scores, hidden_states,attentions = outputs
这里的 hidden_states
是一个长度为 13 的元组,包含模型在每层输出处的隐藏状态以及初始嵌入输出.我想知道,hidden_states[0] 还是 hidden_states[12] 代表最终的隐藏状态向量?
Here hidden_states
is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?
推荐答案
如果查看源代码,特别是 BertEncoder
,你可以看到返回的状态被初始化为一个空元组,然后简单地附加在每次迭代每一层.
If you check the source code, specifically BertEncoder
, you can see that the returned states are initialized as an empty tuple and then simply appended per iteration of each layer.
最后一层作为最后一个元素在此循环之后,参见此处,因此我们可以安全地假设 hidden_states[12]
是最终向量.
The final layer is appended as the last element after this loop, see here, so we can safely assume that hidden_states[12]
is the final vectors.
这篇关于理解 Transformers 库中 BERTforTokenClassification 类的输出时的困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!