我最近与Elasticsearch合作,我有一个问题。我的索引中有100万份文档,但我希望获得超过10_000张。为此,我可以使用scroll APISearchAfter API。我了解scroll api的工作原理,但是SearchAfter有一些问题。

这是我的SearchSourceBuilder方法:

public SearchRequest buildRequest(SearchDistanceParameters args) {
    final SearchSourceBuilder searchSourceBuilder = prepareSearchSourceBuilder(args);
    final SearchRequest searchRequest = new SearchRequest();
    return searchRequest.source(searchSourceBuilder);
}

private SearchSourceBuilder prepareSearchSourceBuilder(SearchDistanceParameters searchDistanceParameters) {
    final FieldSortBuilder fieldSortBuilder = new FieldSortBuilder("_id").order(SortOrder.ASC);
    final SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    final GeoDistanceQueryBuilder geoDistanceQueryBuilder = geoDistanceQuery(GeoLocationModelFieldName.LOCATION.name().toLowerCase());
    geoDistanceQueryBuilder.point(searchDistanceParameters.getLatitude(), searchDistanceParameters.getLongitude());
    geoDistanceQueryBuilder.distance(searchDistanceParameters.getDistance(), DistanceUnit.KILOMETERS);
    searchSourceBuilder.query(geoDistanceQueryBuilder);
    searchSourceBuilder.sort(fieldSortBuilder);
    searchSourceBuilder.searchAfter();
    return searchSourceBuilder;
}

在这里,我按照SearchAfter API文档中的提到在searchAfter()之前进行排序。

在这里,我将请求发送到ElasticSearch:
public SearchResponse sendRequestToElastic(SearchDistanceParameters args) throws IOException {
    SearchRequest searchRequest = searchByDistanceRequestBuilder.buildRequest(args);
    return elasticDao.search(searchRequest, RequestOptions.DEFAULT); // standard RestHighLevelClient.search method inside elasticDao.
}

最后,我尝试从SearchResponse获取对象:
public List<GeoPointsFromElasticSearchResponse> searchByDistance(SearchDistanceParameters searchDistanceParameters) {
        final SearchResponse searchResponse = searchRepository.searchByDistance(searchDistanceParameters);
        return getGeoPointsFromElasticSearchResponses(searchResponse);
    }

private List<GeoPointsFromElasticSearchResponse> getGeoPointsFromElasticSearchResponses(SearchResponse searchResponse) {
        SearchHit[] hits = searchResponse.getHits().getHits();
        return Arrays.stream(hits)
                .map(hit -> {
                    final GeoPointsFromElasticSearchResponse geoPointsFromElasticSearchResponse = new GeoPointsFromElasticSearchResponse();
                    final Map<String, Object> sourceMap = hit.getSourceAsMap();
                    final Map map = (Map) sourceMap.get(GeoLocationModelFieldName.LOCATION.name().toLowerCase());
                    geoPointsFromElasticSearchResponse.setLatitude((Double) map.get("lat"));
                    geoPointsFromElasticSearchResponse.setLongitude((Double) map.get("lon"));
                    log.info("Sorted hits: {}", hit.getSortValues());
                    return geoPointsFromElasticSearchResponse;
                }).collect(Collectors.toList());
    }

但是我只有10_000个对象。似乎我在最后一部分做错了。我究竟做错了什么?如何在Java中正确使用SearchAfter API?

最佳答案

搜索API不会在一个请求中返回所有文档,其行为类似于分页。

您必须传递参数才能进行以下搜索:
https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-request-search-after.html

根据构造函数:searchSourceBuilder.searchAfter(new Object [] {sortAfterValue});

您要设置的值是第一个搜索请求返回的值(命中=> getAt(lastIndex)=> getSortValues())

07-24 09:39
查看更多