问题描述
我正在使用Elasticsearch进行研究.我打算使用余弦相似度,但我注意到它不可用,取而代之的是我们将BM25作为默认评分功能.
I'm conducting a research using elasticsearch. I was planning to use cosine similarity but I noted that it is unavailable and instead we have BM25 as default scoring function.
有什么理由吗?余弦相似度不适用于查询文档吗?为什么选择BM25作为默认值?谢谢
Is there a reason for that? Is cosine similarity improper for querying documents? Why was BM25 chosen as default?Thanks
推荐答案
长时间的Elasticsearch使用TF/IDF算法在查询中查找相似性.但是以前的数字版本更有效地更改为BM25.您可以在文档中阅读相关信息. 好文章解释了什么是弹性搜索以及如何在ES中实现相似性.
Longtime elasticsearch use TF/IDF algorithm to find similarity in queries. But number versions ago is changed to BM25 as more efficient. You can read the information in the documentation. And good article explains what is elastic search and how to the similarity in ES.
您还可以为Elasticsearch编写自定义算法. 这里有一篇很好的文章.
You can also write a custom algorithm to elasticsearch. Here a good article about how to do.
这篇关于余弦相似度与Okapi BM25有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!