问题描述
我有一个包含数千个文档的索引,每个文档都有一个全文本字段。
I have an index containing thousands of documents, each one of them having a full text field.
我想搜索所有这些字段并获取10个最
I want to search through all those fields and fetch the 10 most common words that come back most often.
如果可能的话,我也想在Kibana上可视化它。
I would also like a way of visualizing it on Kibana if that's possible.
推荐答案
最常见的实现方法是使用关键字数据类型
复制全文字段。这将使您能够在该字段上进行术语汇总
-。也许您可以考虑进行重要术语汇总
-,从而避免出现停用词和常见词。在ES 6.x中,您还可以使用重要文本聚合
-,而不创建关键字
字段,但我从不试试吧,我不知道它是如何工作的。相反,如果需要检索每个文档的单词出现频率,则应使用 termvector
-
The most common way to achieve that is to duplicate your full text field with a keyword datatype
. That will get you able to make terms aggregation
on that field - doc here. Maybe you could consider to do a significant term aggregation
- doc here, thus to avoid the presence of stopwords and common words. In ES 6.x you could use also the significant text aggregation
- doc here, without create the keyword
field, but i never try it, i don't know how it works. Instead if you need to retrieve the frequency of the words for each document, you should use the termvector
- doc here
这篇关于获取文本字段中最常用的10个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!