本文介绍了忽略旧文件并使用 logstash 仅推送来自 S3 的最新日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 logstash 忽略旧文件并仅推送来自 S3 的最新日志文件.我们正在使用 logstash 将 cloudtaril 日志从 s3 推送到 elasticsearch.Cloudtrail 日志采用以下格式

how to ignore old files and push only latest log files from S3 using logstash. We are using logstash to push cloudtaril logs from s3 to elasticsearch. Cloudtrail logs are in below format

/AWSLogs/CloudTrail/xxxAccount Numberxxxx/aws-region/year(YYYY)/Month(MM)/day(DD)/

我只需要提取最新的数据(如当前月份的数据),因为整个存储桶都有巨大的 TB 数据,而 logstash 无法扩展那么多数据.有没有办法做到这一点?

I need to pull only latest data(like data form current month), as the entire bucket has huge terrabytes of data and logstash is not able to scale that much data. Is there a way to do this?

推荐答案

我刚刚遇到了同样的问题并像这样解决了它(阅读:解决了它):

I just had the same problem and solved it (read: worked around it) like this:

使用正常配置启动 logstash,这会导致您描述的行为.

Starting logstash with a normal config which leads to the behaviour you described.

它会在启动时在日志中告诉您它的sincedb 文件所在的位置.(默认为 logstash-7.8.0/data/plugins/inputs/s3/sincedb_someid).

It'll tell you on startup in its logs where its sincedb file is located. (defaults to logstash-7.8.0/data/plugins/inputs/s3/sincedb_someid).

创建文件需要一段时间.创建文件后,再次停止 logstash.

The file takes a while to be created. When the file is created stop logstash again.

现在,我想,您可以删除刚刚导入但我不在乎的数据.

Now, I guess, you could delete the data that was just imported but I didn't care to.

现在编辑文件.这只是一个UTC时间戳.调整到接近现在.

Now edit the file. It's just a UTC timestamp. Adjust it close to now.

再次启动logstash,它将开始处理您刚刚输入的时间戳之后创建的文件.

Start logstash again and it will start processing files created after the timestamp you just put in.

这篇关于忽略旧文件并使用 logstash 仅推送来自 S3 的最新日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 16:47