弹性搜索使用上下文加载csv数据 | 弹性搜索使用上下文加载csv数据

本文介绍了弹性搜索使用上下文加载csv数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有3百万条记录.标头是值，类型，other_fields ..

I have 3m records. Headers are value, type, other_fields..

在这里，我需要像此

我需要在记录中为该 value 指定 type 作为上下文.有什么办法可以通过日志存储做到这一点?或其他任何选项?

I need to specify type as context for that value in the record. Is there any way to do this with log stash? or any other options?

val,val_type,id
Sunnyvale it labs, seller, 10223667

推荐答案

为此，我将使用新的 CSV提取处理器

首先创建提取管道以解析您的CSV数据

First create the ingest pipeline to parse your CSV data

PUT _ingest/pipeline/csv-parser
{
  "processors": [
    {
      "csv": {
        "field": "message",
        "target_fields": [
          "val",
          "val_type",
          "id"
        ]
      }
    },
    {
      "script": {
        "source": """
          def val = ctx.val;
          ctx.val = [
            'input': val,
            'contexts': [
              'type': [ctx.val_type]
            ]
          ]
          """
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

然后，您可以按以下步骤索引文档:

Then, you can index your documents as follow:

PUT index/_doc/1?pipeline=csv-parser
{
  "message": "Sunnyvale it labs,seller,10223667"
}

提取后，文档将如下所示:

After ingestion, the document will look like this:

{
  "val_type": "seller",
  "id": "10223667",
  "val": {
    "input": "Sunnyvale it labs",
    "contexts": {
      "type": [
        "seller"
      ]
    }
  }
}

更新:Logstash解决方案

使用Logstash，这也是可行的.配置文件如下所示:

Using Logstash, it's also feasible. The configuration file would look something like this:

input {
    file {
        path => "/path/to/your/file.csv"
        sincedb_path => "/dev/null"
        start_position => "beginning"
    }
}
filter {
    csv {
        skip_header => true
        separator => ","
        columns => ["val", "val_type", "id"]
    }
    mutate {
        rename => { "val" => "value" }
        add_field => {
            "[val][input]" => "%{value}"
            "[val][contexts][type]" => "%{val_type}"
        }
        remove_field => [ "value" ]
    }
}
output {
    elasticsearch {
        hosts => "http://localhost:9200"
        index => "your-index"
    }
}

这篇关于弹性搜索使用上下文加载csv数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！