问题描述
我们使用 kafka 流 api 进行聚合,其中我们也使用 group by.我们还使用状态存储来保存输入主题数据.
我注意到的是
Kafka内部创建了3种topic
Changelog--
重新分区--
-
我无法理解的是
- 为什么当我拥有
-
中的所有数据时,它会创建变更日志主题 - 重新分区主题是否包含分组后的数据.
- 而且我看到 Changelog 和 topicname-parition 的大小大约相同.
数据有什么不同,因此必须为此保存不同的文件.
'Changelog' 和 'repartition' 内部 Kafka 主题特定于 Kafka Streams.
来自 Kafka Wiki,
Kafka Streams 允许有状态的流处理,即具有内部状态的操作符.这种内部状态在所谓的状态存储中进行管理.状态存储可以是短暂的(失败时丢失)或容错的(失败后恢复).Kafka Streams DSL 使用的默认实现是一种容错状态存储,使用 1. 内部创建和压缩的变更日志主题(用于容错)和 2. 一个(或多个)RocksDB 实例(用于缓存的键值查找).因此,在启动/停止应用程序和倒带/重新处理的情况下,需要正确管理这些内部数据.
变更日志主题是在流上有加入/聚合操作时创建的.实际上,聚合调用的结果会创建一个状态存储,并且为了容错,状态存储由 Kafka Changelog 主题备份.
聚合结果存储在这个内部主题中.当应用程序重新启动且 application-id 未更改时,将从更改日志主题中恢复状态.
重新分区主题是在流上有关键修改操作时创建的.例如 groupByKey() 操作创建重新分区主题.查看 JIRA 页面以了解有关自动创建重新分区主题的更多信息.>
这两个内部主题使 Kafka 流具有容错状态的流处理能力.
重新分区主题是否包含分组后的数据? - 是
Changelog 和 topicname-parition 的大小大致相同 - 可能所有聚合操作的结果都存储在此主题中.
更多详情,请查看Kafka Wiki页面.
We are using kafka stream api for aggregation in which we are also using group by.We are also using state store where it saves the input topics data.
What i notice is
Kafka internally creates 3 kinds of topic
Changelog-<storeid>-<partition>
Repartition-<storeid>-<partition>
<topicname>-<partition>
What I am not able to understand is
- Why it creates changelog topic when I have all the data in
<topic>-<partition>
- Does repartition topic contains data after grouping.
- and I see that the size of Changelog and topicname-parition are approx same.
What is different in the data so that it has to save a different file for that.
'Changelog' and 'repartition' internal Kafka topics are specific to Kafka Streams.
From Kafka Wiki,
Changelog topics are created when there are join/aggregation operations on the stream. Actually the result of aggregation call creates a state store and for fault-tolerance the state store is backed up by a Kafka Changelog topic.
The aggregation results are stored into this internal topic. State will be recovered from changelog topic when applications is restarted and application-id wasn't changed.
Re-partition topics are created when there are key modifying operations on the stream. For example, groupByKey() operation creates repartition topic. Check JIRA page to know more about auto creation of re-parition topic.
These two internal topics enables Kafka streams to have fault-tolerant stateful stream processing capabilities.
Does repartition topic contains data after grouping? - Yes
The size of Changelog and topicname-parition are approx same - Possibly, the result of all aggregation operations are stored in this topic.
For more details, please check Kafka Wiki page.
这篇关于Kafka 使用了哪些内部主题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!