问题描述
在我的具体用例中,作为TF-IDF算法计算的IDF因子会影响我的查询得分。基本上,我希望查询只考虑术语频率。是否可以禁用特定索引的IDF因子,即将其设置为1?我已经研究了相似性模块(在0.90.X版本),但没有真正发现任何可以帮助的东西;对于function_score查询也是一样。我需要在java中编写一个自定义的相似性类吗?还是有一个我想要实现的插件?
如何使用constant_score查询?
请参阅进行的截图
In my particular use case, the IDF-factor that gets calculated as part of the TF-IDF algorithm messes up the scoring for my queries. Basically, I want the queries to only take the term frequency into account. Is it possible to disable the IDF factor, i.e set it to 1, for a particular index? I have looked into the similarity module (in version 0.90.X), but haven't really found anything that could help; same goes for the function_score query. Do I need to write a custom Similarity class in java? Or is there a plugin for what I'm trying to achieve?
What about constant_score query?
See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/ignoring-tfidf.html
Don't hesitate to use ?explain=true to see how scoring is working.
As you can here without constant_filter:
And with constant_filter query (that wraps your real query):
- Screenshots made with https://beemapp.me
这篇关于禁用IDF计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!