36.分页及deep paging

主要知识点

1、es分页

2、deep paging

一、es分页语法

size，from 这两个关键字

GET /_search?size=10 指定每页10条数据

GET /_search?size=10&from=0 指定每页10条数据，并从第0条数据开始

GET /_search?size=10&from=20 指定每页10条数据，并从第20条数据开始

注意size,from在前在后都没有关系。

二、es分页实例

1、先检查数据

GET /test_index/test_type/_search

结果是：

"hits": {

"total": 9,

"max_score": 1,

我们假设将这9条数据分成3页，每一页是3条数据，

GET /test_index/test_type/_search?from=0&size=3

结果是：

{

"took": 2,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

"hits": {

"total": 9,

"max_score": 1,

"hits": [

{

"_index": "test_index",

"_type": "test_type",

"_id": "8",

"_score": 1,

"_source": {

"test_field": "test client 2"

}

{

"_index": "test_index",

"_type": "test_type",

"_id": "6",

"_score": 1,

"_source": {

"test_field": "tes test"

}

{

"_index": "test_index",

"_type": "test_type",

"_id": "4",

"_score": 1,

"_source": {

"test_field": "test4"

}

]

}

第一页：id=8,6,4

GET /test_index/test_type/_search?from=3&size=3

第二页：id=2,自动生成,7

GET /test_index/test_type/_search?from=6&size=3

第三页：id=1,11,3

可以看出，es分页是以_source排序，不是以id排序

三、deep paging

1、什么是deep paging

简单来说就是搜索的特别深，假如说现在es中有6000条数据，分别存在3个primary shard中，要在这6000条数据中搜索中第100页的数据（每页10条），这种情况就是deep paging,

2、deep paging的做法

最常想到的做法是，在每个shard中搜索1000到1010这10条数据，然后用这30条数据排序，排序之后取10条数据就是要搜索的数据，这种做法是错的，因为3个shard中的数据的_source分数不一样，可能这某一个shard中第一条数据的_source分数比另一个shard中第1000条都要高，所以在每个shard中搜索1000到1010这10条数据然后排序的做法是不正确的，鉴于这种情况，正确的做法是把这三个shard中的0到1010条数据全部搜索出来（按排序顺序），然后全部返回给coordinate node，由coordinate node按_source分数排序后，得到想要的结果。然后返回给客户端。

3、deep paging性能问题

（1）耗费网络带宽，因为搜索过深的话，各shard要把数据传送给coordinate node，这个过程是有大量数据传递的，消耗网络，

（2）消耗内存，各shard要把数据传送给coordinate node，这个传递回来的数据，是被coordinate node保存在内存中的，这样会大量消耗内存。

（3）消耗cpu coordinate node要把传回来的数据进行排序，这个排序过程很消耗cpu.

鉴于deep paging的性能问题，所以在实际工作中应尽量减少使用。