本文介绍了设计Kafka消费者和生产者以实现可伸缩性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想设计一种解决方案,用于将不同种类的电子邮件发送给多个提供商.总体概述.

I want to design a solution for sending different kinds of e-mails to several providers. The general overview.

我有几个上游提供商Sendgrid,Zoho,Mailgun等.它们将用于发送电子邮件等.例如:

I have several upstream providers Sendgrid, Zoho, Mailgun and etc. They will be used to send e-mails and etc. For example:

  • 用于注册新用户的电子邮件
  • 用于删除用户的电子邮件
  • 电子邮件以获取空间配额限制

(通常大约6种电子邮件类型)

(in general around 6 types of e-mails)

每种类型的电子邮件都应生成到生产者中,转换为序列化的Java对象,然后发送到与上游提供程序集成在一起的适当的Kafka使用者.

Every type of e-mail should be generated into Producers, converted into Serialized Java Object and Send to the appropriate Kafka Consumer integrated with the Upstream provider.

问题是如何设计Kafka以获得最佳性能和可扩展性?

The questions is how to design Kafka for maximum performance and scalability?

    到目前为止,
  • 第一种解决方案是,我可以考虑是否为每种类型的电子邮件和每个网关都具有主题(6x4 = 24个主题).将来,我希望添加更多类型的消息和网关.也许它将达到600个主题.这将为维护提供大量Java源代码,并需要管理许多主题.另一个缺点是卡夫卡的原木会很大.

  • 1-st solution so far that I can think if is to have topic for every type of e-mail message and every gateway(6x4 = 24 topics). In the future I'm expecting to add more types of messages and gateways. Maybe it will reach 600 topics. This will make a lot Java source code for maintenance and a lot of topics to manage. Another downside will be that Kafka logs will be huge.

第二个解决方案是为每个使用者(集成网关)使用1个主题.但是在这种情况下,如何根据要发送的消息类型来发送每个类型不同的序列化Java对象?

2-nd solution will be to use 1 topic for each consumer(integration gateway). But in this case how I can send every type different serialized Java object based on the type of message that I want to send?

是否有更好的方法来设计此设置,从而使我可以更轻松地扩展它并使其对将来的集成非常可靠?

Is there some better way to design this setup so that it allow me to scale it much more easy and make it very robust for future integrations?

您可以在此处看到我如何在消费者和生产者之间发送消息:

You can see here how I send message between consumers and producers: org.apache.kafka.common.KafkaException: class SaleRequestFactory is not an instance of org.apache.kafka.common.serialization.Serializer

  1. 顺序很重要,因为通讯将是异步的.生产者将等待返回的消息以获取状态
  2. 将每个网关的数据保留在不同的主题上并不重要
  3. 您想要哪种隔离?我希望将消息/主题彼此完全隔离,以防止以后在需要添加更多网关或消息类型时出现错误
  1. Order matters because the communication will be asyncronius. Producers will wait for returned messages for status
  2. It's not important to keep the data of each gateway on a different topic
  3. What kind of isolation do you want?I want ot isolate the messages/topics completely from each other in order to prevent mistakes in future when I need to add more gateways or types of messages

将每个网关的数据保留在不同的主题上对您来说很重要吗? -不,我只想隔离数据.

is it important to you to keep the data of each gateway on a different topic? - no, I just want ot isolate hte data.

如果每个网关只讨论一个主题,您是否在意它在客户端造成的开销? -阅读不必要的消息,编写更多的逻辑,混合串行器等

If you would go with a single topic per gateway, do you care about the overhead it will make on the client-side? - read unnecessary messages, write more logic, hybrid serializer, etc

我在这里不知道.我的主要考虑是使系统易于扩展并具有新功能.

I have no idea here. My main consern is to make the system easy to extent with new features.

推荐答案

很遗憾,这里没有简单的答案.
您需要问自己几个问题,并从一些权衡中进行选择-

Well, unfortunately, there is no easy answer here.
You would need to ask yourself a few questions and choose from a few tradeoffs -

首先,顺序是否重要?您只是要从A点转发到B点的电子邮件吗?还是要(合理地)将事件的合理顺序保留到同一实体(例如,关于用户创建的邮件需要在收到与更改密码的同一新用户有关的邮件之前.)

First, does order matters? is it just E-mails that you want to forward from point A to point B?, or do you want to keep (I guess you would) a reasonable order of events to the same entity (e.g - a mail about user creation need to be received before mail about the same new user who changed his password.)

如果顺序很重要,最好在分区键中使用相同的主题,因为Kafka可以保证订购消息仅在分区级别.

If order matters, it's better to use the same topic with a partitioning key as Kafka has guarantees on the ordering of the messages only at the partition level.

您想要哪种隔离?对您而言,将每个网关的数据保留在不同的主题上很重要吗?
如果每个网关只讨论一个主题,那么您是否会担心它在客户端造成的开销? -阅读不必要的消息,编写更多的逻辑,混合串行器等

What kind of isolation do you want? is it important to you to keep the data of each gateway on a different topic?
If you would go with a single topic per gateway, do you care about the overhead it will make on the client-side? - read unnecessary messages, write more logic, hybrid serializer, etc

您可以估算要扩展的尺寸吗? -如果您采用第一个解决方案,则按网关和主题进行设置;事件类型,突然间您将需要添加100倍的网关,这不一定是正确的选择.此外,如果您需要更快地处理User-Change-Emails会怎样? -更多分区会导致更高的吞吐量 -您能这样做吗?

Can you estimate on which dimensions would you scale? - if you would go with the first solution, topic per gateway & event type, and suddenly you will need to add 100X of gateways, it won't necessarily be the right call. Moreover, what will happen if you would need to process the User-Change-Emails faster? - more partitions lead to higher throughput - would you be able to do so?

Confluent上有一些很棒的文章,这些文章可能会帮助您-

Confluent has few great articles about those subjects that might help you -

您应将几种事件类型放在同一个Kafka中话题?

Should You Put Several Event Types in the Same Kafka Topic?

如何选择主题数/Kafka群集中的分区?

How to choose the number of topics/partitions in a Kafka cluster?

这篇关于设计Kafka消费者和生产者以实现可伸缩性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-28 03:02