我想计算两个列表元素的逐点相互信息得分。
假设我们有
ListA = "Hi there, This is only a test message. Please enjoy the weather in the park."
ListB = "work, bank, tree, weather, sun"
然后,我该如何计算所有对(工作,高),(工作,在那里),(工作,此)...(太阳,公园)的PMI分数。
它为我计算了一个列表的二元组的PMI:
def pmi(word1, word2, unigram_freq, bigram_freq, unigram_freq_values, bigram_freq_values, output_name):
prob_word1 = unigram_freq[word1] / float(sum(unigram_freq_values))
prob_word2 = unigram_freq[word2] / float(sum(unigram_freq_values))
prob_word1_word2 = bigram_freq / float(sum(bigram_freq_values))
pmi = math.log(prob_word1_word2/float(prob_word1*prob_word2),2)
unigrams = nltk.FreqDist(ListA)
bigrams = ngrams(ListA,2)
n1_freq = nltk.FreqDist(unigrams)
n2_freq = nltk.FreqDist(bigrams)
output_pmi = "test.txt"
for bigram, freq in n2_freq.most_common(1000):
w1 = bigram[0]
w2 = bigram[1]
unigram_freq_val = n1_freq.values()
bigram_freq_val = n2_freq.values()
pmi(w1, w2, unigrams, freq, unigram_freq_val, bigram_freq_val, output_pmi)
我陷入了从ListA和ListB计算bigrams的PMI的问题。如果有人可以帮助我,我将非常感激。非常感谢!
(当然,这两个列表是我的任务外观的最小示例。)
最佳答案
如果要查找两个列表的所有组合:
import itertools
ListA = "Hi there, This is only a test message. Please enjoy the weather in the park."
ListB = "work, bank, tree, weather, sun"
WordsA = ListA.split()
WordsB = ListB.split()
#print(WordsA, "\n\n", WordsB) #This is to show what the new lists are
c = list(itertools.product(WordsA, WordsB))
print(c)
关于python - 两个列表的元素的PMI,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40451282/