本文介绍了在交付到 S3 之前,我可以在 Kinesis Firehose 中自定义分区吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Firehose 流,旨在从不同来源和不同事件类型中摄取数百万个事件.流应将所有数据传送到一个 S3 存储桶,作为原始未更改数据的存储.

我想根据嵌入在事件消息中的元数据(如事件源、事件类型和事件日期)在 S3 中对这些数据进行分区.

但是,Firehose 遵循其基于记录到达时间的默认分区.是否可以自定义此分区行为以满足我的需要?

更新:已接受的答案更新为新答案表明该功能自 2021 年 9 月起可用

解决方案

自 2021 年 9 月 1 日起,AWS Kinesis Firehose 支持此功能.阅读

I have a Firehose stream that is intended to ingest millions of events from different sources and of different event-types. The stream should deliver all data to one S3 bucket as a store of rawunaltered data.

I was thinking of partitioning this data in S3 based on metadata embedded within the event message like event-souce, event-type and event-date.

However, Firehose follows its default partitioning based on record arrival time. Is it possible to customize this partitioning behavior to fit my needs?

Update: Accepted answer updated as a new answer suggests the feature is available as of Sep 2021

解决方案

Since September 1st, 2021, AWS Kinesis Firehose supports this feature. Read the announcement blog post here.

From the documentation:

Here is how it looks like from UI:

这篇关于在交付到 S3 之前,我可以在 Kinesis Firehose 中自定义分区吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 12:29