本文介绍了计算单词之间的相关系数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于文本分析程序,我想分析文本中某些单词的同时出现.例如,我希望看到巴拉克"和奥巴马"一词比其他词更经常出现(即具有正相关).

For a text analysis program, I would like to analyze the co-occurrence of certain words in a text. For example, I would like to see that e.g. the words "Barack" and "Obama" appear more often together (i.e. have a positive correlation) than others.

这似乎并不那么困难.但是,老实说,我只知道如何计算两个数字之间的相关性,而不是如何计算文本中两个单词之间的相关性.

This does not seem to be that difficult. However, to be honest, I only know how to calculate the correlation between two numbers, but not between two words in a text.

  1. 我如何最好地解决这个问题?
  2. 如何计算单词之间的相关性?

我考虑使用条件概率,因为奥巴马比奥巴马更有可能.但是,我尝试解决的问题更为根本,并且不取决于单词的顺序

推荐答案

Ngram统计信息包(NSP)正是致力于这项任务.他们有在线论文,其中描述了他们使用的关联度量.我没有亲自使用过该软件包,因此无法评论其可靠性/要求.

The Ngram Statistics Package (NSP) is devoted precisely to this task. They have a paper online which describes the association measures they use. I haven't used the package myself, so I cannot comment on its reliability/requirements.

这篇关于计算单词之间的相关系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 11:46