问题描述
我有句子的集合,我需要对它们进行分析,看看他们是如何的相似。
I have a collection of sentences, and I need to analyse them to see how similar they are.
是否有任何既定的算法来做到这一点?
Are there any established algorithms to do this?
我在乎的:
- 含有相同的话(忽略拐点现在)
- 包含类似的命令,相同的话
我用Levenshtein距离和正克的拼写之前,虽然我不完全相信,如果这些转化为我的目的。
I've used Levenshtein distance and n-grams for spelling before, although I'm not entirely confident if these translate to my purposes.
天真,我不关心拼写差异,错别字可视为不同的词,但也许这将是很好的考虑到这一点。
Naively, "I don't care about spelling differences, typos can be treated as different words" although perhaps it would be nice to account for this.
拆分句子的空间和上面(或其他方式)的算法之一将是一个起点,也许是一些混合
perhaps some hybrid of splitting the sentence at spaces and one of the above (or other) algorithms would be a starting point
有哪些选项可用?有什么建议?
What options are available? Any advice?
谢谢!
推荐答案
这纸比较了几种句子相似度措施。也许你可以使用其中一个作为是,或修改您的需求。
This paper compares several sentence similarity measures. Perhaps you can use one of them as is, or modify it for your needs.
否则句相似度是谷歌的一个很好的关键字词。
Otherwise sentence similarity measure is a good key term to google for.
这篇关于算法比较英语句子的相似性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!