问题描述
Kafka 如何保证消费者不会两次读取一条消息?
或者上面的场景可能吗?同一条消息可以被单个或多个消费者读取两次吗?
导致Consumer消费重复消息的场景很多
- 生产者成功发布消息但未能确认重试同一消息的原因
- 生产者发布一批消息,但部分发布消息失败.在这种情况下,它将重试并再次重新发送同一批次,这将导致重复
- 消费者从 Kafka 接收一批消息并手动提交他们的偏移量 (enable.auto.commit=false).如果消费者在提交给 Kafka 之前失败,下次消费者将再次消费相同的记录,从而在消费者端复制重复.
为了保证不消耗重复的消息,作业的执行和提交偏移必须是原子的,以保证消费者端的一次性交付语义.您可以使用以下参数来实现完全一种语义.但请您理解,这会影响性能.
- 在生产者端启用幂等性,这将保证不会两次发布相同的消息enable.idempotence=true
- 定义的事务 (isolation.level) 是 read_committedisolation.level=read_committed
在Kafka Stream上面的设置可以通过设置Exactly-Once来实现语义真实,使其成为单元事务
幂等
幂等交付使生产者能够在单个生产者的生命周期内将消息准确地写入 Kafka 到主题的特定分区一次,而不会丢失数据和每个分区的顺序.
事务(isolation.level)
事务使我们能够以原子方式更新多个主题分区中的数据.交易中包含的所有记录都将被成功保存,或者它们都不会被保存.它允许您在同一个事务中提交您的消费者偏移量以及您已处理的数据,从而允许端到端的恰好一次语义.
生产者不会等待向 Kafka 写入消息,而生产者使用 beginTransaction、commitTransaction 和 abortTransaction(在失败的情况下)消费者使用隔离.级别为 read_committed 或 read_uncommitted
- read_committed:消费者将始终只读取提交的数据.
- read_uncommitted:按偏移顺序读取所有消息,无需等待用于要提交的事务
详细请参考参考
How does Kafka guarantee consumers doesn't read a single message twice?
Or is the above scenario possible?Could the same message be read twice by single or by multiple consumers?
There are many scenarios which cause Consumer to consume the duplicate message
- Producer published the message successfully but failed to acknowledge which cause to retry the same message
- Producer publishing a batch of the message but failed partially published messages. In that case, it will retry and resent the same batch again which will cause duplicate
- Consumers receive a batch of messages from Kafka and manually commit their offset (enable.auto.commit=false).If consumers failed before committing to Kafka, next time Consumers will consume the same records again which reproduce duplicate on the consumer side.
To guarantee not to consume duplicate messages the job's execution and the committing offset must be atomic to guarantee exactly-once delivery semantic at the consumer side.You can use the below parameter to achieve exactly one semantic. But please you have understood this comes with a compromise with performance.
- enable idempotence on the producer side which will guarantee not to publish the same message twiceenable.idempotence=true
- Defined Transaction (isolation.level) is read_committedisolation.level=read_committed
Idempotent
Idempotent delivery enables producers to write messages to Kafka exactly once to a particular partition of a topic during the lifetime of a single producer without data loss and order per partition.
Transaction (isolation.level)
Transactions give us the ability to atomically update data in multiple topic partitions. All the records included in a transaction will be successfully saved, or none of them will be. It allows you to commit your consumer offsets in the same transaction along with the data you have processed, thereby allowing end-to-end exactly-once semantics.
The producer doesn't wait to write a message to Kafka whereas the Producer uses beginTransaction, commitTransaction, and abortTransaction(in case of failure) Consumer uses isolation. level either read_committed or read_uncommitted
- read_committed: Consumers will always read committed data only.
- read_uncommitted: Read all messages in offset order without waitingfor transactions to be committed
Please refer more in detail refrence
这篇关于Kafka 如何保证消费者不会两次读取一条消息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!