问题描述
我已经开始学习 NLTK
并且我正在学习 这里,他们在这里使用这样的二元组找到条件概率.
I have started learning NLTK
and I am following a tutorial from here, where they find conditional probability using bigrams like this.
import nltk
from nltk.corpus import brown
cfreq_brown_2gram = nltk.ConditionalFreqDist(nltk.bigrams(brown.words()))
但是我想使用三元组找到条件概率.当我尝试将 nltk.bigrams
更改为 nltk.trigrams
时,出现以下错误.
However I want to find conditional probability using trigrams. When I try to change nltk.bigrams
to nltk.trigrams
I get the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "home/env/local/lib/python2.7/site-packages/nltk/probability.py", line 1705, in __init__
for (cond, sample) in cond_samples:
ValueError: too many values to unpack (expected 2)
如何使用三元组计算条件概率?
How can I calculate the conditional probability using trigrams?
推荐答案
nltk.ConditionalFreqDist
期望其数据为 (condition, item)
元组的序列.nltk.trigrams
返回长度为 3 的元组,这会导致您发布的确切错误.
nltk.ConditionalFreqDist
expects its data as a sequence of (condition, item)
tuples. nltk.trigrams
returns tuples of length 3, which causes the exact error you posted.
从您的帖子中并不清楚您想使用什么作为条件,但是进行语言建模时的惯例是将最后一个词作为其前辈的条件.以下代码演示了您将如何实现它.
From your post it's not exactly clear what you want to use as conditions, but the convention when doing language modeling is to condition the last word on its predecessors.The following code demonstrates how you'd implement that.
brown_trigrams = nltk.trigrams(brown.words())
condition_pairs = (((w0, w1), w2) for w0, w1, w2 in brown_trigrams)
cfd_brown = nltk.ConditionalFreqDist(condition_pairs)
这篇关于在python nltk中查找trigram的条件概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!