本文介绍了Spark Streaming:Spark 和 Kafka 通信是如何发生的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解 Kafka 和 Spark(Streaming) 节点之间的通信是如何发生的.我有以下问题.

I would like to understand how the communication between the Kafka and Spark(Streaming) nodes takes place. I have the following questions.

  1. 如果 Kafka 服务器和 Spark 节点位于两个独立的集群中,将如何进行通信.配置它们需要哪些步骤.
  2. 如果两者都在同一个集群中但在不同的节点中,那么通信将如何进行.

通信我在这里的意思是它是 RPC 还是 Socket 通信.我想了解内部解剖

communication i mean here is whether it is a RPC or Socket communication. I would like to understand the internal anatomy

感谢任何帮助.

提前致谢.

推荐答案

首先,Kafka 节点和 Spark 节点是否在同一个集群中不计入,但它们应该能够连接到每个其他(在防火墙中打开端口).

First of all, it doesn't count if the Kafka nodes and Spark nodes are in the same cluster or not, but they should be able to connect to each other (open ports in firewall).

有两种使用 Spark Streaming 从 Kafka 读取数据的方法,使用旧的 KafkaUtils.createStream() API 和更新的 KafkaUtils.createDirectStream() 方法.

There are 2 ways to read from Kafka with Spark Streaming, using the older KafkaUtils.createStream() API, and the newer, KafkaUtils.createDirectStream() method.

我不想讨论它们之间的差异,这是有据可查的 这里(简而言之,直播更好).

I don't want to get into the differences between them, that is well documented here (in short, direct stream is better).

解决您的问题,通信是如何发生的(内部解剖):找出答案的最佳方法是查看 Spark 源代码.

Addressing your question, how does the communication happen (internal anatomy): the best way to find out is looking at the Spark source code.

createStream() API 使用一组 Kafka 消费者,直接来自官方 org.apache.kafka 包.这些 Kafka 消费者有自己的客户端,称为 NetworkClient,您可以查看 此处.简而言之,NetworkClient 使用套接字进行通信.

The createStream() API uses a set of Kafka consumers, directly from the official org.apache.kafka packages. These Kafka consumers have their own client called the NetworkClient, which you can check here. In short, the NetworkClient uses sockets for communicating.

createDirectStream() API 确实使用来自同一个 org.apache.kafka 包的 Kafka SimpleConsumer.SimpleConsumer 类使用 java.nio.ReadableByteChannel 从 Kafka 读取,java.nio.ReadableByteChanneljava.nio.SocketChannel 的子类,所以最后它也用套接字完成,但更间接地使用 Java 的非阻塞 I/O 便利 API.

The createDirectStream() API does use the Kafka SimpleConsumer from the same org.apache.kafka package. The SimpleConsumer class reads from Kafka with a java.nio.ReadableByteChannel which is a subclass of java.nio.SocketChannel, so in the end it is with done with sockets as well, but a bit more indirectly using Java's Non-blocking I/O convenience APIs.

所以回答你的问题:它是用套接字完成的.

So to answer your question: it is done with sockets.

这篇关于Spark Streaming:Spark 和 Kafka 通信是如何发生的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 13:15