Elasticsearch扫描和滚动

Elasticsearch扫描和滚动

本文介绍了Elasticsearch扫描和滚动 - 添加到新的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Elasticsearch和命令行程序noobie问题。

Elasticsearch and command line programming noobie question.

我已经elasticsearch为本地计算机上,并希望从使用不同版本使用的扫描和滚动API,并将它们添加到我的索引ES的服务器拉文件。我有麻烦搞清楚如何与ES大宗原料药做到这一点。

I have elasticsearch set up locally on my computer and want to pull documents from a server that uses a different version of es using the scan and scroll api and add them into my index. I am having trouble figuring out how to do this with the bulk api for es.

现在,在我的测试阶段,我只是用下面的code(工作)甩开了服务器的几个文件:

Right now in my testing phase I am just pulling a few documents from the server using the following code (which works):

   http MY-OLD-ES.com:9200/INDEX/TYPE/_search?size=1000 | jq   .hits.hits[] -c | while read x; do id="`echo "$x" | jq -r ._id`"; index="`echo "$x" | jq -r ._index`"; type="`echo "$x" | jq -r ._type`"; doc="`echo "$x" | jq ._source`"; http put "localhost:9200/junk-$index/$type/$id" <<<"$doc"; done

(小白和有点糊涂了)如何扫描和滚动作品的任何提示。到目前为止,知道我可以滚动,并得到一个滚动的ID,但我不清楚如何处理滚动ID做。
如果我称之为

Any tips on how scan and scroll works (noob and a bit confused). So far know I can scroll and get a scroll id, but I'm unclear what to do with the scroll id.If I call

http get http://MY-OLD-ES.com:9200/my_index/_search?scroll=1m&search_type=scan&size=10

我会收到一个滚动的ID。可这中管道和分析的一样吗?此外,我相信我会需要一个while循环来告诉它保持请求。我应该去正是这个怎么样?

I'll receive a scroll id. Can this be piped in and parsed the same way? Additionally, I believe I'll need a while loop to tell it to keep requesting. How exactly should I go about this?

谢谢!

推荐答案

的文档解释它pretty清晰。当您获得 scroll_id (长的base64连接codeD字符串),你的请求主体传递进去。随着袅袅的请求将看起来是这样的:

The scan and scroll documentation explains it pretty clearly. After you get the scroll_id (a long base64 encoded string), you pass it in with the body of the request. With curl the request would looks something like this:

curl -XGET 'http://MY-OLD-ES.com:9200/_search/scroll?scroll=1m' -d '
c2Nhbjs1OzExODpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExOTpRNV9aY1VyUVM4U0
NMd2pjWlJ3YWlBOzExNjpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExNzpRNV9aY1Vy
UVM4U0NMd2pjWlJ3YWlBOzEyMDpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzE7dG90YW
xfaGl0czoxOw==
'

注意到,虽然打开滚动的第一个请求是 / my_index / _search ,读取数据的第二个请求是 / _搜索/滚动。每次调用,传递?滚动=1米查询字符串,它刷新了超时之前滚动自动关闭。

Notice that while the first request to open the scroll was to /my_index/_search, the second request to read the data was to /_search/scroll. Each time you call that, passing the ?scroll=1m querystring, it refreshes the timeout before the scroll is automatically closed.

有两件事情需要注意的:

There are two more things to be aware of:


  1. 尺寸您通过打开书卷适用于每个碎片,所以你会得到尺寸乘以在对每个请求的索引碎片的数量。

  2. 每个请求 / _搜索/滚动将返回一个新的 scroll_id ,你必须传递给下一个电话获得下一批结果。你不能只是保持与相同 scroll_id
  3. 调用
  1. The size you pass when opening the scroll applies to each shard, so you will get size multiplied by the number of shards in your index on each request.
  2. Each request to /_search/scroll will return a new scroll_id which you must pass on the next call to get the next batch of results. You can't just keep calling with the same scroll_id.

当没有命中的卷动请求返回它是完整的。

It is complete when no hits are returned in the scroll request.

这篇关于Elasticsearch扫描和滚动 - 添加到新的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 03:14