本文介绍了弹性搜索使用上下文加载csv数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有3百万条记录.标头是值,类型,other_fields
..
I have 3m records. Headers are value, type, other_fields
..
在这里,我需要像此
我需要在记录中为该 value
指定 type
作为上下文.有什么办法可以通过日志存储做到这一点?或其他任何选项?
I need to specify type
as context for that value
in the record. Is there any way to do this with log stash? or any other options?
val,val_type,id
Sunnyvale it labs, seller, 10223667
推荐答案
为此,我将使用新的 CSV提取处理器
首先创建提取管道以解析您的CSV数据
First create the ingest pipeline to parse your CSV data
PUT _ingest/pipeline/csv-parser
{
"processors": [
{
"csv": {
"field": "message",
"target_fields": [
"val",
"val_type",
"id"
]
}
},
{
"script": {
"source": """
def val = ctx.val;
ctx.val = [
'input': val,
'contexts': [
'type': [ctx.val_type]
]
]
"""
}
},
{
"remove": {
"field": "message"
}
}
]
}
然后,您可以按以下步骤索引文档:
Then, you can index your documents as follow:
PUT index/_doc/1?pipeline=csv-parser
{
"message": "Sunnyvale it labs,seller,10223667"
}
提取后,文档将如下所示:
After ingestion, the document will look like this:
{
"val_type": "seller",
"id": "10223667",
"val": {
"input": "Sunnyvale it labs",
"contexts": {
"type": [
"seller"
]
}
}
}
更新:Logstash解决方案
使用Logstash,这也是可行的.配置文件如下所示:
Using Logstash, it's also feasible. The configuration file would look something like this:
input {
file {
path => "/path/to/your/file.csv"
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
csv {
skip_header => true
separator => ","
columns => ["val", "val_type", "id"]
}
mutate {
rename => { "val" => "value" }
add_field => {
"[val][input]" => "%{value}"
"[val][contexts][type]" => "%{val_type}"
}
remove_field => [ "value" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "your-index"
}
}
这篇关于弹性搜索使用上下文加载csv数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!