问题描述
我有大量(〜40000个)嵌套的JSON对象,我想将它们插入elasticsearch中一个索引.
JSON对象的结构如下:
{
"customerid": "10932"
"date": "16.08.2006",
"bez": "xyz",
"birthdate": "21.05.1990",
"clientid": "2",
"address": [
{
"addressid": "1",
"tile": "Mr",
"street": "main str",
"valid_to": "21.05.1990",
"valid_from": "21.05.1990",
},
{
"addressid": "2",
"title": "Mr",
"street": "melrose place",
"valid_to": "21.05.1990",
"valid_from": "21.05.1990",
}
]
}
因此JSON字段(在此示例中为地址)可以具有JSON对象数组.
logstash配置看起来像什么将这样的JSON文件/对象导入elasticsearch?该索引的elasticsearch映射应该看起来像JSON的结构. elasticsearch文档ID应设置为customerid
.
input {
stdin {
id => "JSON_TEST"
}
}
filter {
json{
source => "customerid"
....
....
}
}
output {
stdout{}
elasticsearch {
hosts => "https://localhost:9200/"
index => "customers"
document_id => "%{customerid}"
}
}
如果您可以控制正在生成的内容,最简单的方法是将输入格式设置为单行json,然后使用json_lines
编解码器. /p>
只需将您的stdin
更改为
stdin { codec => "json_lines" }
然后它将正常工作:
cat input_file.json | logstash -f json_input.conf
其中input_file.json具有类似
的行{"customerid":1,"nested": {"json":"here"}}
{"customerid":2,"nested": {"json":there"}}
然后您就不需要json
过滤器
I have a large amount (~40000) of nested JSON objects I want to insert into elasticsearch an index.
The JSON objects are structured like this:
{
"customerid": "10932"
"date": "16.08.2006",
"bez": "xyz",
"birthdate": "21.05.1990",
"clientid": "2",
"address": [
{
"addressid": "1",
"tile": "Mr",
"street": "main str",
"valid_to": "21.05.1990",
"valid_from": "21.05.1990",
},
{
"addressid": "2",
"title": "Mr",
"street": "melrose place",
"valid_to": "21.05.1990",
"valid_from": "21.05.1990",
}
]
}
So a JSON field (address in this example) can have an array of JSON objects.
What would a logstash config look like to import JSON files/objects like this into elasticsearch? The elasticsearch mapping for this index should just look like the structure of the JSON. The elasticsearch document id should be set to customerid
.
input {
stdin {
id => "JSON_TEST"
}
}
filter {
json{
source => "customerid"
....
....
}
}
output {
stdout{}
elasticsearch {
hosts => "https://localhost:9200/"
index => "customers"
document_id => "%{customerid}"
}
}
If you have control of what's being generated, the easiest thing to do is to format you input as single line json and then use the json_lines
codec.
Just change your stdin
to
stdin { codec => "json_lines" }
and then it'll just work:
cat input_file.json | logstash -f json_input.conf
where input_file.json has lines like
{"customerid":1,"nested": {"json":"here"}}
{"customerid":2,"nested": {"json":there"}}
And then you won't need the json
filter
这篇关于Logstash-将嵌套的JSON导入Elasticsearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!