我使用Logstash从https://www.kaggle.com/wcukierski/the-simpsons-by-the-data提取csv文件,并将其保存到Elasticsearch。首先,我使用以下conf摄取了simpsons_characters.csv
:
input {
file {
path => "/Users/xyz/Downloads/the-simpsons-by-the-data/simpsons_characters.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["id", "name", "normalized_name", "gender"]
separator => ","
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
}
}
但是,当我这样查询时:
http://localhost:9200/simpsons/name/Lou
哪里simpsons = index
name = type
(我认为...不确定)我得到以下回复:
{
"_index": "simpsons",
"_type": "name",
"_id": "Lou",
"found": false
}
所以,问题是,为什么我没有得到正确的答复。此外,当您通过csv进行批量提取时,文档的
type
是什么?谢谢!
最佳答案
The default type
in Logstash Elasticsearch output is logs
。因此,无论您如何定义ID(从csv-document_id => "%{id}"
获取ID或让ES定义自己的ID),都可以将这些文档作为http://localhost:9200/simpsons/logs/THE_ID
获得。
如果您不知道ID,只想检查是否存在:http://localhost:9200/simpsons/logs/_search?pretty
。
如果要查看索引的映射,例如查找索引的_type
:http://localhost:9200/simpsons/_mapping?pretty
。
要更改默认的_type
:
elasticsearch {
hosts => "localhost"
action => "index"
index => "simpsons"
document_type => "characters"
document_id => "%{id}"
}