在python nltk中查找trigram的条件概率

本文介绍了在python nltk中查找trigram的条件概率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经开始学习 NLTK 并且我正在学习这里，他们在这里使用这样的二元组找到条件概率.

I have started learning NLTK and I am following a tutorial from here, where they find conditional probability using bigrams like this.

import nltk
from nltk.corpus import brown
cfreq_brown_2gram = nltk.ConditionalFreqDist(nltk.bigrams(brown.words()))

但是我想使用三元组找到条件概率.当我尝试将 nltk.bigrams 更改为 nltk.trigrams 时，出现以下错误.

However I want to find conditional probability using trigrams. When I try to change nltk.bigrams to nltk.trigrams I get the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "home/env/local/lib/python2.7/site-packages/nltk/probability.py", line 1705, in __init__
    for (cond, sample) in cond_samples:
ValueError: too many values to unpack (expected 2)

如何使用三元组计算条件概率?

How can I calculate the conditional probability using trigrams?

推荐答案

nltk.ConditionalFreqDist 期望其数据为 (condition, item) 元组的序列.nltk.trigrams 返回长度为 3 的元组，这会导致您发布的确切错误.

nltk.ConditionalFreqDist expects its data as a sequence of (condition, item) tuples. nltk.trigrams returns tuples of length 3, which causes the exact error you posted.

从您的帖子中并不清楚您想使用什么作为条件，但是进行语言建模时的惯例是将最后一个词作为其前辈的条件.以下代码演示了您将如何实现它.

From your post it's not exactly clear what you want to use as conditions, but the convention when doing language modeling is to condition the last word on its predecessors.The following code demonstrates how you'd implement that.

brown_trigrams = nltk.trigrams(brown.words())
condition_pairs = (((w0, w1), w2) for w0, w1, w2 in brown_trigrams)
cfd_brown = nltk.ConditionalFreqDist(condition_pairs)

这篇关于在python nltk中查找trigram的条件概率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

1403页，肝出来的..