python - NLTK:语料级别的蓝色vs句级BLEU得分

我已经在python中导入了nltk来计算ubuntu上的bleu分数。我理解句子级的bleu分数是如何工作的，但我不理解语料库级的bleu分数是如何工作的。
下面是我的语料库级BLeu分数代码：

import nltk

hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)

由于某种原因，上述代码的Bleu分数为0。我期望语料库水平的布鲁分数至少为0.5。
这是我的句子级BLeu分数代码

import nltk

hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1])
print(BLEUscore)

在这里，句子级别的bleu分数是0.71，这是我所期望的，考虑到简短的惩罚和漏掉的单词“a”。但是，我不理解语料库级别的BLEU评分是如何工作的。
任何帮助都将不胜感激。

最佳答案

DR：

>>> import nltk
>>> hypothesis = ['This', 'is', 'cat']
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # list of references for 1 sentence.
>>> list_of_references = [references] # list of references for all sentences in corpus.
>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453

（注意：您必须在develop分支上提取最新版本的nltk，才能获得稳定版本的bleu score实现）
长：
实际上，如果整个语料库中只有一个引用和一个假设，那么corpus_bleu()和sentence_bleu()都应该返回与上面示例中所示相同的值。
在代码中，我们看到：

def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                  smoothing_function=None):
    return corpus_bleu([references], [hypothesis], weights, smoothing_function)

如果我们看一下参数

 def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                      smoothing_function=None):
    """"
    :param references: reference sentences
    :type references: list(list(str))
    :param hypothesis: a hypothesis sentence
    :type hypothesis: list(str)
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The sentence-level BLEU score.
    :rtype: float
    """

sentence_bleu引用的输入是acorpus_bleu。
因此，如果您有一个句子字符串，例如sentence_bleu，您必须对其进行标记以获取字符串列表，sentence_bleu并且由于它允许多个引用，因此它必须是字符串列表，例如，如果您有第二个引用，“这是一只猫”，您输入到list(list(str))的内容将是：

references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)

当涉及到“引用”参数的“列表”时，基本上是：

def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
                smoothing_function=None):
    """
    :param references: a corpus of lists of reference sentences, w.r.t. hypotheses
    :type references: list(list(list(str)))
    :param hypotheses: a list of hypothesis sentences
    :type hypotheses: list(list(str))
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The corpus-level BLEU score.
    :rtype: float
    """

除了查看"This is a cat" is actually a duck-type of ["This", "is", "a", "cat"]中的doctest，您还可以查看a list of whatever the sentence_bleu() takes as references中的unittest，以了解如何使用corpus_bleu()中的每个组件。
顺便说一下，由于在（sentence_bleu()>）（nltk/translate/bleu_score.py）中将nltk/test/unit/translate/test_bleu_score.py导入为bleu_score.py，因此使用

from nltk.translate import bleu

将与以下内容相同：

from nltk.translate.bleu_score import sentence_bleu

在代码中：

>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False