您如何等待ElasticSearch完成对其索引的更新

您如何等待ElasticSearch完成对其索引的更新

本文介绍了ElasticSearch更新不是立即的,您如何等待ElasticSearch完成对其索引的更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提高针对ElasticSearch进行测试的套件的性能.

I'm attempting to improve performance on a suite that tests against ElasticSearch.

测试需要很长时间,因为Elasticsearch在更新后不会立即更新其索引.例如,以下代码在运行时不会引发断言错误.

The tests take a long time because Elasticsearch does not update it's indexes immediately after updating. For instance, the following code runs without raising an assertion error.

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

results = elasticsearch.search()
assert not results
# results are not populated

目前,针对该问题的解决方案是将time.sleep调用放入代码中,以使ElasticSearch有一些时间来更新其索引.

Currently out hacked together solution to this issue is dropping a time.sleep call into the code, to give ElasticSearch some time to update it's indexes.

from time import sleep
from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Don't want to use sleep functions
sleep(1)

results = elasticsearch.search()
assert len(results) == 1
# results are now populated

显然,这并不是一件好事,因为它很容易失败,并且假设ElasticSearch花费多于一秒钟的时间来更新其索引,尽管这种尝试不太可能失败.同样,当您运行100项这样的测试时,它非常慢.

Obviously this isn't great, as it's rather failure prone, hypothetically if ElasticSearch takes longer than a second to update it's indexes, despite how unlikely that is, the test will fail. Also it's extremely slow when you're running 100s of tests like this.

我试图解决此问题的方法是查询等待集群作业,以查看是否还有要完成的任务.但是,这是行不通的,该代码将在没有断言错误的情况下运行.

My attempt to solve the issue has been to query the pending cluster jobs to see if there are any tasks left to be done. However this doesn't work, and this code will run without an assertion error.

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Query if there are any pending tasks
while elasticsearch.cluster.pending_tasks()['tasks']:
    pass

results = elasticsearch.search()
assert not results
# results are not populated

所以基本上,回到我的原始问题,ElasticSearch更新不是立即,您如何等待ElasticSearch完成索引的更新?

So basically, back to my original question, ElasticSearch updates are notimmediate, how do you wait for ElasticSearch to finish updating it's index?

推荐答案

从5.0.0版开始,elasticsearch具有一个选项:

As of version 5.0.0, elasticsearch has an option:

 ?refresh=wait_for

在索引,更新,删除和批量api上.这样,在结果在ElasticSearch中可见之前,请求将不会收到响应. (是的!)

on the Index, Update, Delete, and Bulk api's. This way, the request won't receive a response until the result is visible in ElasticSearch. (Yay!)

请参见 https://www.elastic.co /guide/zh-CN/elasticsearch/reference/master/docs-refresh.html 了解更多信息.

edit:似乎该功能已经成为最新的Python elasticsearch api的一部分: https://elasticsearch-py.readthedocs.io/en /master/api.html#elasticsearch.Elasticsearch.index

edit: It seems that this functionality is already part of the latest Python elasticsearch api:https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.index

将您的elasticsearch.update更改为:

Change your elasticsearch.update to:

elasticsearch.update(
     index='blog',
     doc_type='blog'
     id=1,
     refresh='wait_for',
     body={
        ....
    }
)

并且您不需要任何睡眠或轮询.

and you shouldn't need any sleep or polling.

这篇关于ElasticSearch更新不是立即的,您如何等待ElasticSearch完成对其索引的更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 07:38