倒排索引

可以查看这里得分词原理https://www.cnblogs.com/LQBlog/articles/5743991.html

分析器

分析器处理过程的3步骤

1.字符过滤器:去除字符的特殊字符

2.分词器:将词组分词

3.对分词词组进行操作,比如转大写 分词后的词组替换等

ES内置的几种分析器结果

例句:Set the shape to semi-transparent by calling set_trans(5)

标准分析器

适合英文 es默认的分词器

根据单词边界分词 然后去掉特殊符号 最后转小写

分词后结果

set, the, shape, to, semi, transparent, by, calling, set_trans, 5

简单分析器

根据单词边界分词 非单词切割

分词后结果

set, the, shape, to, semi, transparent, by, calling, set, trans

语言分析器

特定语言分析器。自带一套字库

测试分析器

get请求:http://127.0.0.1:9200/_analyze

body:

{
"analyzer":"standard",//分词器
"text":"Set the shape to semi-transparent by calling set_trans(5)"//测试分词的fulltext
}

结果:

{
"tokens": [
{
"token": "set",//被索引的词
"start_offset": 0,//原文本起始位置
"end_offset": 3,//原文本结束位置
"type": "<ALPHANUM>",
"position": 0//第几个出现
},
{
"token": "the",
"start_offset": 4,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "shape",
"start_offset": 8,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "to",
"start_offset": 14,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "semi",
"start_offset": 17,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "transparent",
"start_offset": 22,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "by",
"start_offset": 34,
"end_offset": 36,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "calling",
"start_offset": 37,
"end_offset": 44,
"type": "<ALPHANUM>",
"position": 7
},
{
"token": "set_trans",
"start_offset": 45,
"end_offset": 54,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "5",
"start_offset": 55,
"end_offset": 56,
"type": "<NUM>",
"position": 9
}
]
}

查询某个文档的分词结果 

GET /${index}/${type}/${id}/_termvectors?fields=${fields_name}

05-21 22:33