问题描述
我有一个 Firehose 流,旨在从不同来源和不同事件类型中摄取数百万个事件.流应将所有数据传送到一个 S3 存储桶,作为原始未更改数据的存储.
我想根据嵌入在事件消息中的元数据(如事件源、事件类型和事件日期)在 S3 中对这些数据进行分区.
但是,Firehose 遵循其基于记录到达时间的默认分区.是否可以自定义此分区行为以满足我的需要?
更新:已接受的答案更新为新答案表明该功能自 2021 年 9 月起可用
自 2021 年 9 月 1 日起,AWS Kinesis Firehose 支持此功能.阅读
I have a Firehose stream that is intended to ingest millions of events from different sources and of different event-types. The stream should deliver all data to one S3 bucket as a store of rawunaltered data.
I was thinking of partitioning this data in S3 based on metadata embedded within the event message like event-souce, event-type and event-date.
However, Firehose follows its default partitioning based on record arrival time. Is it possible to customize this partitioning behavior to fit my needs?
Update: Accepted answer updated as a new answer suggests the feature is available as of Sep 2021
Since September 1st, 2021, AWS Kinesis Firehose supports this feature. Read the announcement blog post here.
From the documentation:
Here is how it looks like from UI:
这篇关于在交付到 S3 之前,我可以在 Kinesis Firehose 中自定义分区吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!