问题描述
我过去曾与Kafka一起工作过,最近需要将部分数据管道移植到AWS Kinesis Stream上.现在我已经知道Kinesis实际上是Kafka的一个分支,并具有许多相似之处.
I have worked a bit with Kafka in the past and lately there is a requirement to port part of the data pipeline on AWS Kinesis Stream. Now I have read that Kinesis is effectively a fork of Kafka and share many similarities.
但是,我没有看到我们如何让多个消费者从同一个流中读取数据,每个消费者都有其对应的偏移量.每个数据记录都有一个序列号,但是我找不到特定于消费者的任何东西(卡夫卡集团ID?).
However I have failed to see how can we have multiple consumers reading from the same stream, each with their corresponding offset. There is a sequence number given to each data record, but I couldn't find anything specific to consumer(Kafka group Id?).
真的有可能在同一个AWS Kinesis Stream上让不同的消费者使用不同的摄取率吗?
Is it really possible to have different consumers with different ingestion rate over same AWS Kinesis Stream?
推荐答案
是.
您可以具有多个Kinesis Consumer应用程序.假设您有2个.
You can have multiple Kinesis Consumer Applications. Let's say you have 2.
- 第一个消费者应用程序(我认为这是Kafka中的消费者组"吗?)可以是"first-app",并将其位置存储在DynamoDB"first-app-table"中.它可以具有任意数量的节点(ec2实例).
- 第二个消费者应用程序也可以在同一流上工作,并将其位置存储在另一个DynamoDB表上,例如"second-app-table".
每个表将包含对于应用Y,在分片X上最后处理的位置是什么"信息.因此,这两个应用程序将相同分片的检查点存储在不同的位置,这使它们独立.
Each table will contain "what is the last processed position on shard X for app Y" information. So the 2 applications store checkpoints for the same shards in a different place, which makes them independent.
关于摄入率,有一个" idleTimeBetweenReadsInMillis "值,这是用于获取操作的Amazon Kinesis API的轮询间隔.例如,第一个应用程序可以具有"2000"轮询间隔,因此它将每2秒轮询一次流的分片,以查看是否有新记录.
About the ingestion rate, there is a "idleTimeBetweenReadsInMillis" value in consumer applications using KCL, that is the polling interval for Amazon Kinesis API for Get operations. For example first application can have "2000" poll interval, so it will poll stream's shards every 2 seconds to see if any new record came.
我不太了解卡夫卡,但据我所知;在Kinesis中,Kafka分区"是碎片",同样,Kafka偏移"是"序列号". Kinesis Consumer Library使用术语"检查点"中存储的序列.就像您说的那样,这些概念是相似的.
I don't know Kafka well but as far as I remember; Kafka "partition" is "shard" in Kinesis, likewise Kafka "offset" is "sequence number" in Kinesis. Kinesis Consumer Library uses the term "checkpoint" for the stored sequences. Like you said, the concepts are similar.
这篇关于Kafka喜欢在Kinesis Stream上抵消吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!