我正在使用python进行AWS ElasticSearch,我有3个字段的JSON文件。
("cat1","Cat2","cat3"), each row is separated with \n
example cat1:food, cat2: wine, cat3: lunch etc.
from requests_aws4auth import AWS4Auth
import boto3
import requests
payload = {
"settings": {
"number_of_shards": 10,
"number_of_replicas": 5
},
"mappings": {
"Categoryall" :{
"properties" : {
"cat1" : {
"type": "string"
},
"Cat2":{
"type" : "string"
},
"cat3" : {
"type" : "string"
}
}
}
}
}
r = requests.put(url, auth=awsauth, json=payload)
我为索引创建了架构/映射,如上所示,但我不知道如何填充索引。
我正在考虑为JSON文件放置一个
for
循环并调用post
请求以插入索引。不知道如何进行。我想创建索引并在索引中批量上传此文件。任何建议,将不胜感激。
最佳答案
看看Elasticsearch Bulk API。
基本上,您需要创建一个批量请求正文并将其发布到您的“https:// {elastic-endpoint} / _bulk” URL中。
以下示例显示了一个批量请求,该请求将3个json记录插入名为“my_index”的索引中:
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "1" } }
{ "cat1" : "food 1", "cat2": "wine 1", "cat3": "lunch 1" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "2" } }
{ "cat1" : "food 2", "cat2": "wine 2", "cat3": "lunch 2" }
{ "index" : { "_index" : "my_index", "_type" : "_doc", "_id" : "3" } }
{ "cat1" : "food 3", "cat2": "wine 3", "cat3": "lunch 3" }
每个json记录由2个json对象表示。
因此,如果您将批量请求主体写入名为 post-data.txt 的文件中,则可以使用Python将其发布,如下所示:
with open('post-data.txt','rb') as payload:
r = requests.post('https://your-elastic-endpoint/_bulk', auth=awsauth,
data=payload, ... add more params)
另外,您可以尝试Python elasticsearch bulk helpers。
关于python - Elasticsearch 和AWS python,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53218174/