问题描述
Elasticsearch和命令行程序noobie问题。
Elasticsearch and command line programming noobie question.
我已经elasticsearch为本地计算机上,并希望从使用不同版本使用的扫描和滚动API,并将它们添加到我的索引ES的服务器拉文件。我有麻烦搞清楚如何与ES大宗原料药做到这一点。
I have elasticsearch set up locally on my computer and want to pull documents from a server that uses a different version of es using the scan and scroll api and add them into my index. I am having trouble figuring out how to do this with the bulk api for es.
现在,在我的测试阶段,我只是用下面的code(工作)甩开了服务器的几个文件:
Right now in my testing phase I am just pulling a few documents from the server using the following code (which works):
http MY-OLD-ES.com:9200/INDEX/TYPE/_search?size=1000 | jq .hits.hits[] -c | while read x; do id="`echo "$x" | jq -r ._id`"; index="`echo "$x" | jq -r ._index`"; type="`echo "$x" | jq -r ._type`"; doc="`echo "$x" | jq ._source`"; http put "localhost:9200/junk-$index/$type/$id" <<<"$doc"; done
(小白和有点糊涂了)如何扫描和滚动作品的任何提示。到目前为止,知道我可以滚动,并得到一个滚动的ID,但我不清楚如何处理滚动ID做。
如果我称之为
Any tips on how scan and scroll works (noob and a bit confused). So far know I can scroll and get a scroll id, but I'm unclear what to do with the scroll id.If I call
http get http://MY-OLD-ES.com:9200/my_index/_search?scroll=1m&search_type=scan&size=10
我会收到一个滚动的ID。可这中管道和分析的一样吗?此外,我相信我会需要一个while循环来告诉它保持请求。我应该去正是这个怎么样?
I'll receive a scroll id. Can this be piped in and parsed the same way? Additionally, I believe I'll need a while loop to tell it to keep requesting. How exactly should I go about this?
谢谢!
推荐答案
的文档解释它pretty清晰。当您获得 scroll_id
(长的base64连接codeD字符串),你的请求主体传递进去。随着袅袅的请求将看起来是这样的:
The scan and scroll documentation explains it pretty clearly. After you get the scroll_id
(a long base64 encoded string), you pass it in with the body of the request. With curl the request would looks something like this:
curl -XGET 'http://MY-OLD-ES.com:9200/_search/scroll?scroll=1m' -d '
c2Nhbjs1OzExODpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExOTpRNV9aY1VyUVM4U0
NMd2pjWlJ3YWlBOzExNjpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExNzpRNV9aY1Vy
UVM4U0NMd2pjWlJ3YWlBOzEyMDpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzE7dG90YW
xfaGl0czoxOw==
'
注意到,虽然打开滚动的第一个请求是 / my_index / _search
,读取数据的第二个请求是 / _搜索/滚动
。每次调用,传递?滚动=1米
查询字符串,它刷新了超时之前滚动自动关闭。
Notice that while the first request to open the scroll was to /my_index/_search
, the second request to read the data was to /_search/scroll
. Each time you call that, passing the ?scroll=1m
querystring, it refreshes the timeout before the scroll is automatically closed.
有两件事情需要注意的:
There are two more things to be aware of:
- 的
尺寸
您通过打开书卷适用于每个碎片,所以你会得到尺寸
乘以在对每个请求的索引碎片的数量。 - 每个请求
/ _搜索/滚动
将返回一个新的scroll_id
,你必须传递给下一个电话获得下一批结果。你不能只是保持与相同scroll_id
。 调用
- The
size
you pass when opening the scroll applies to each shard, so you will getsize
multiplied by the number of shards in your index on each request. - Each request to
/_search/scroll
will return a newscroll_id
which you must pass on the next call to get the next batch of results. You can't just keep calling with the samescroll_id
.
当没有命中的卷动请求返回它是完整的。
It is complete when no hits are returned in the scroll request.
这篇关于Elasticsearch扫描和滚动 - 添加到新的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!