Kafka源码分析及图解原理之Producer端

一.前言

　　任何消息队列都是万变不离其宗都是3部分，消息生产者（Producer）、消息消费者（Consumer）和服务载体（在Kafka中用Broker指代）。那么本篇主要讲解Producer端，会有适当的图解帮助理解底层原理。

　 Kafka源码分析及图解原理之Producer端-LMLPHP

一.开发应用

　　首先介绍一下开发应用，如何构建一个KafkaProducer及使用，还有一些重要参数的简介。

1.1 一个栗子

 /**

  * Kafka Producer Demo实例类。

  *

  * @author GrimMjx

  */

 public class ProducerDemo {

     public static void main(String[] args) throws ExecutionException, InterruptedException {

         Properties prop = new Properties();

         prop.put("client.id", "DemoProducer");

         // 以下三个参数必须指定

         // 用于创建与Kafka broker服务器的连接，集群的话则用逗号分隔

         prop.put("bootstrap.servers", "localhost:9092");

         // 消息的key序列化方式

         prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");

         // 消息的value序列化方式

         prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

         // 以下参数为可配置选项

         prop.put("acks", "-1");

         prop.put("retries", "3");

         prop.put("batch.size", "323840");

         prop.put("linger.ms", "10");

         prop.put("buffer.memory", "33554432");

         prop.put("max.block.ms", "3000");

         KafkaProducer<String, String> producer = new KafkaProducer<String, String>(prop);

         try {

             // 异步发送，继续发送消息不用等待。当有结果返回时，callback会被通知执行

             producer.send(new ProducerRecord<String, String>("test", "key1", "value1"),

                     new Callback() {

                         // 返回结果RecordMetadata记录元数据包括了which partition的which offset

                         public void onCompletion(RecordMetadata metadata, Exception e) {

                             // 发送成功

                             if (e == null) {

                                 System.out.println("The offset of the record we just sent is: " + metadata.offset());

                                 // 发送失败

                             } else {

                                 if (e instanceof RetriableException) {

                                     // 处理可重试的异常，比如分区leader副本不可用

                                     // 一般用retries参数来设置重置，毕竟这里也没有什么其他能做的，也是同样的重试发送消息

                                 } else {

                                     // 处理不可重试异常

                                 }

                             }

                         }

                     }

             );

             // 同步发送，send方法返回Future，然后get。在没有返回结果一直阻塞

             producer.send(new ProducerRecord<String, String>("test", "key1", "value1")).get();

         } finally {

             // producer运行的时候占用系统额外资源，最后一定要关闭

             producer.close();

         }

     }

 }

　　注释已经写得十分详细了，参数的下面会说，这里就只说一下异步发送和同步发送。我们先看下KafkaProducer.send方法，可以看到返回的是一个Future，那么如何实现同步阻塞和异步非阻塞呢？

Kafka源码分析及图解原理之Producer端-LMLPHP

同步阻塞：send方法返回Future，然后get。在没有返回结果一直阻塞，无限等待
异步非阻塞：send方法提供callback，调用send方法后可以继续发送消息不用等待。当有结果返回时，callback会被通知执行

Kafka源码分析及图解原理之Producer端-LMLPHP

1.2 重要参数

　　这里分析一下broker端的重要参数，前3个是必要参数。Kafka的文档真的很吊，可以看这个类，每个参数和注释都解释的十分详细：org.apache.kafka.clients.producer.ProducerConfig

bootstrap.server（必要）：broker服务器列表，如果集群的机器很多，不用全配，producer可以发现集群中所有broker
key.serializer/value.serializer（必要）：key和value的序列化方式。这两个参数都必须是全限定类名，可以自定义拓展。
acks：有3个值，0、1和all（-1）
- 0：produce不关心broker端的处理结果，吞吐量最高
- 1：produce发送消息给leader broker端，broker端写入本地日志返回结果，折中方案
- all(-1)：配合min.insync.replicas使用，控制写入isr中的多少副本才算成功
  - 思考：如果当前集群中ISR副本小于min.insync.replicas会发生什么，消费者还能正常消费吗？stack overflow地址：https://stackoverflow.com/questions/57231185/does-min-insync-replicas-property-effects-consumers-in-kafka
buffer.memory：producer启动会创建一个内存缓冲区保存待发送的消息，这部分的内存大小就是这个参数来控制的
commpression.type：压缩算法的选择，目前有GZIP、Snappy和LZ4。目前结合LZ4性能最好
retries：重试次数，0.11.0.0版本之前可能导致消息重发
batch.size：相同分区多条消息集合叫batch，当batch满了则发送给broker
linger.ms：难道batch没满就不发了么？当然不是，不满则等linger.ms时间再发。延时权衡行为
max.request.size：控制发送请求的大小
request.timeout.ms：超过时间则会在回调函数抛出TimeoutException异常
partitioner.class：分区机制，可自定义，默认分区器的处理是：有key则用murmur2算法计算key的哈希值，对总分区取模算出分区号，无key则轮询
enable.idempotence：Apache Kafka 0.11.0.0版本用于实现EOS的利器

二.源码分析及图解原理

2.1 RecordAccumulator

　　上面介绍的参数中buffer.memory是缓冲区的大小，RecordAccmulator就是承担了缓冲区的角色。默认是32MB。

　　还有上面介绍的参数中batch.size提到了batch的概念，在kafka producer中，消息不是一条一条发给broker的，而是多条消息组成一个ProducerBatch，然后由Sender一次性发出去，这里的batch.size并不是消息的条数（凑满多少条即发送），而是一个大小。默认是16KB，可以根据具体情况来进行优化。

　　在RecordAccumulator中，最核心的参数就是：

private final ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;

　　它是一个ConcurrentMap，key是TopicPartition类，代表一个topic的一个partition。value是一个包含ProducerBatch的双端队列。等待Sender线程发送给broker。画张图来看下：

Kafka源码分析及图解原理之Producer端-LMLPHP

　　再从源码角度来看如何添加到缓冲区队列里的，主要看这个方法：org.apache.kafka.clients.producer.internals.RecordAccumulator#append：

　　注释写的十分详细了，这里需要思考一点，为什么分配内存的代码没有放在synchronized同步块里？看起来这里很多余，导致下面的synchronized同步块中还要tryAppend一下，因为这时候可能其他线程已经创建好RecordBatch了。造成多余的内存申请。但是仔细想想，如果把分配内存放在synchronized同步块会有什么问题？

　　内存申请不到线程会一直等待，如果放在同步块中会造成一直不释放Deque队列的锁，那其他线程将无法对Deque队列进行线程安全的同步操作。那不是走远了？

 /**

  * Add a record to the accumulator, return the append result

  * <p>

  * The append result will contain the future metadata, and flag for whether the appended batch is full or a new batch is created

  * <p>

  *

  * @param tp The topic/partition to which this record is being sent

  * @param timestamp The timestamp of the record

  * @param key The key for the record

  * @param value The value for the record

  * @param headers the Headers for the record

  * @param callback The user-supplied callback to execute when the request is complete

  * @param maxTimeToBlock The maximum time in milliseconds to block for buffer memory to be available

  */

 public RecordAppendResult append(TopicPartition tp,

                                  long timestamp,

                                  byte[] key,

                                  byte[] value,

                                  Header[] headers,

                                  Callback callback,

                                  long maxTimeToBlock) throws InterruptedException {

     // We keep track of the number of appending thread to make sure we do not miss batches in

     // abortIncompleteBatches().

     appendsInProgress.incrementAndGet();

     ByteBuffer buffer = null;

     if (headers == null) headers = Record.EMPTY_HEADERS;

     try {

         // check if we have an in-progress batch

         // 其实就是一个putIfAbsent操作的方法，不展开分析

         Deque<ProducerBatch> dq = getOrCreateDeque(tp);

         // batches是线程安全的，但是Deque不是线程安全的

         // 已有在处理中的batch

         synchronized (dq) {

             if (closed)

                 throw new IllegalStateException("Cannot send after the producer is closed.");

             RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);

             if (appendResult != null)

                 return appendResult;

         }

         // we don't have an in-progress record batch try to allocate a new batch

         // 创建一个新的ProducerBatch

         byte maxUsableMagic = apiVersions.maxUsableProduceMagic();

         // 分配一个内存

         int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));

         log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());

         // 申请不到内存

         buffer = free.allocate(size, maxTimeToBlock);

         synchronized (dq) {

             // Need to check if producer is closed again after grabbing the dequeue lock.

             if (closed)

                 throw new IllegalStateException("Cannot send after the producer is closed.");

             // 再次尝试添加，因为分配内存的那段代码并不在synchronized块中

             // 有可能这时候其他线程已经创建好RecordBatch了，finally会把分配好的内存还回去

             RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);

             if (appendResult != null) {

                 // 作者自己都说了，希望不要总是发生，多个线程都去申请内存，到时候还不是要还回去？

                 // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...

                 return appendResult;

             }

             // 创建ProducerBatch

             MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);

             ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());

             FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));

             dq.addLast(batch);

             // incomplete是一个Set集合，存放不完整的batch

             incomplete.add(batch);

             // Don't deallocate this buffer in the finally block as it's being used in the record batch

             buffer = null;

             // 返回记录添加结果类

             return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);

         }

     } finally {

         // 释放要还的内存

         if (buffer != null)

             free.deallocate(buffer);

         appendsInProgress.decrementAndGet();

     }

 }

　　附加tryAppend()方法，不多说，都在代码注释里：

 /**

  *  Try to append to a ProducerBatch.

  *

  *  If it is full, we return null and a new batch is created. We also close the batch for record appends to free up

  *  resources like compression buffers. The batch will be fully closed (ie. the record batch headers will be written

  *  and memory records built) in one of the following cases (whichever comes first): right before send,

  *  if it is expired, or when the producer is closed.

  */

 private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, Deque<ProducerBatch> deque) {

     // 获取最新加入的ProducerBatch

     ProducerBatch last = deque.peekLast();

     if (last != null) {

         FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());

         if (future == null)

             last.closeForRecordAppends();

         else

             // 记录添加结果类包含future、batch是否已满的标记、是否是新batch创建的标记

             return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false);

     }

     // 如果这个Deque没有ProducerBatch元素，或者已经满了不足以加入本条消息则返回null

     return null;

 }

　　以上代码见图解：

Kafka源码分析及图解原理之Producer端-LMLPHP

2.2 Sender

　　Sender里最重要的方法莫过于run()方法，其中比较核心的方法是org.apache.kafka.clients.producer.internals.Sender#sendProducerData

　　其中pollTimeout需要认真读注释，意思是最长阻塞到至少有一个通道在你注册的事件就绪了。返回0则表示走起发车了

 private long sendProducerData(long now) {

     // 获取当前集群的所有信息

     Cluster cluster = metadata.fetch();

     // get the list of partitions with data ready to send

     // @return ReadyCheckResult类的三个变量解释

     // 1.Set<Node> readyNodes 准备好发送的节点

     // 2.long nextReadyCheckDelayMs 下次检查节点的延迟时间

     // 3.Set<String> unknownLeaderTopics 哪些topic找不到leader节点

     RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

     // if there are any partitions whose leaders are not known yet, force metadata update

     // 如果有些topic不知道leader信息，更新metadata

     if (!result.unknownLeaderTopics.isEmpty()) {

         // The set of topics with unknown leader contains topics with leader election pending as well as

         // topics which may have expired. Add the topic again to metadata to ensure it is included

         // and request metadata update, since there are messages to send to the topic.

         for (String topic : result.unknownLeaderTopics)

             this.metadata.add(topic);

         this.metadata.requestUpdate();

     }

     // 去除不能发送信息的节点

     // remove any nodes we aren't ready to send to

     Iterator<Node> iter = result.readyNodes.iterator();

     long notReadyTimeout = Long.MAX_VALUE;

     while (iter.hasNext()) {

         Node node = iter.next();

         if (!this.client.ready(node, now)) {

             iter.remove();

             notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionDelay(node, now));

         }

     }

     // 获取将要发送的消息

     // create produce requests

     Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes,

             this.maxRequestSize, now);

     // 保证发送消息的顺序

     if (guaranteeMessageOrder) {

         // Mute all the partitions drained

         for (List<ProducerBatch> batchList : batches.values()) {

             for (ProducerBatch batch : batchList)

                 this.accumulator.mutePartition(batch.topicPartition);

         }

     }

     // 过期的batch

     List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.requestTimeout, now);

     boolean needsTransactionStateReset = false;

     // Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics

     // for expired batches. see the documentation of @TransactionState.resetProducerId to understand why

     // we need to reset the producer id here.

     if (!expiredBatches.isEmpty())

         log.trace("Expired {} batches in accumulator", expiredBatches.size());

     for (ProducerBatch expiredBatch : expiredBatches) {

         failBatch(expiredBatch, -1, NO_TIMESTAMP, expiredBatch.timeoutException());

         if (transactionManager != null && expiredBatch.inRetry()) {

             needsTransactionStateReset = true;

         }

         this.sensors.recordErrors(expiredBatch.topicPartition.topic(), expiredBatch.recordCount);

     }

     if (needsTransactionStateReset) {

         transactionManager.resetProducerId();

         return 0;

     }

     sensors.updateProduceRequestMetrics(batches);

     // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately

     // loop and try sending more data. Otherwise, the timeout is determined by nodes that have partitions with data

     // that isn't yet sendable (e.g. lingering, backing off). Note that this specifically does not include nodes

     // with sendable data that aren't ready to send since they would cause busy looping.

     // 到底返回的这个pollTimeout是啥，我觉得用英文的注释解释比较清楚

     // 1.The amount of time to block if there is nothing to do

     // 2.waiting for a channel to become ready; if zero, block indefinitely;

     long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);

     if (!result.readyNodes.isEmpty()) {

         log.trace("Nodes with data ready to send: {}", result.readyNodes);

         // if some partitions are already ready to be sent, the select time would be 0;

         // otherwise if some partition already has some data accumulated but not ready yet,

         // the select time will be the time difference between now and its linger expiry time;

         // otherwise the select time will be the time difference between now and the metadata expiry time;

         pollTimeout = 0;

     }

     // 发送消息

     // 最后调用client.send()

     sendProduceRequests(batches, now);

     return pollTimeout;

 }

　　其中也需要了解这个方法：org.apache.kafka.clients.producer.internals.RecordAccumulator#ready。返回的类中3个关键参数的解释都在注释里。烦请看注释，我解释不好的地方可以看英文，原汁原味最好：

 /**

  * Get a list of nodes whose partitions are ready to be sent, and the earliest time at which any non-sendable

  * partition will be ready; Also return the flag for whether there are any unknown leaders for the accumulated

  * partition batches.

  * <p>

  * A destination node is ready to send data if:

  * <ol>

  * <li>There is at least one partition that is not backing off its send

  * <li><b>and</b> those partitions are not muted (to prevent reordering if

  *   {@value org.apache.kafka.clients.producer.ProducerConfig#MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION}

  *   is set to one)</li>

  * <li><b>and <i>any</i></b> of the following are true</li>

  * <ul>

  *     <li>The record set is full</li>

  *     <li>The record set has sat in the accumulator for at least lingerMs milliseconds</li>

  *     <li>The accumulator is out of memory and threads are blocking waiting for data (in this case all partitions

  *     are immediately considered ready).</li>

  *     <li>The accumulator has been closed</li>

  * </ul>

  * </ol>

  */

 /**

  * @return ReadyCheckResult类的三个变量解释

  * 1.Set<Node> readyNodes 准备好发送的节点

  * 2.long nextReadyCheckDelayMs 下次检查节点的延迟时间

  * 3.Set<String> unknownLeaderTopics 哪些topic找不到leader节点

  *

  * 一个节点满足以下任一条件则表示可以发送数据

  * 1.batch满了

  * 2.batch没满，但是等了lingerMs的时间

  * 3.accumulator满了

  * 4.accumulator关了

  */

 public ReadyCheckResult ready(Cluster cluster, long nowMs) {

     Set<Node> readyNodes = new HashSet<>();

     long nextReadyCheckDelayMs = Long.MAX_VALUE;

     Set<String> unknownLeaderTopics = new HashSet<>();

     boolean exhausted = this.free.queued() > 0;

     for (Map.Entry<TopicPartition, Deque<ProducerBatch>> entry : this.batches.entrySet()) {

         TopicPartition part = entry.getKey();

         Deque<ProducerBatch> deque = entry.getValue();

         Node leader = cluster.leaderFor(part);

         synchronized (deque) {

             // leader没有且队列非空则添加unknownLeaderTopics

             if (leader == null && !deque.isEmpty()) {

                 // This is a partition for which leader is not known, but messages are available to send.

                 // Note that entries are currently not removed from batches when deque is empty.

                 unknownLeaderTopics.add(part.topic());

                 // 如果readyNodes不包含leader且muted不包含part

                 // mute这个变量跟producer端的一个配置有关系：max.in.flight.requests.per.connection=1

                 // 主要防止topic同分区下的消息乱序问题，限制了producer在单个broker连接上能够发送的未响应请求的数量

                 // 如果设置为1，则producer在收到响应之前无法再给该broker发送该topic的PRODUCE请求

             } else if (!readyNodes.contains(leader) && !muted.contains(part)) {

                 ProducerBatch batch = deque.peekFirst();

                 if (batch != null) {

                     long waitedTimeMs = batch.waitedTimeMs(nowMs);

                     boolean backingOff = batch.attempts() > 0 && waitedTimeMs < retryBackoffMs;

                     // 等待时间

                     long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs;

                     // batch满了

                     boolean full = deque.size() > 1 || batch.isFull();

                     // batch过期

                     boolean expired = waitedTimeMs >= timeToWaitMs;

                     boolean sendable = full || expired || exhausted || closed || flushInProgress();

                     if (sendable && !backingOff) {

                         readyNodes.add(leader);

                     } else {

                         long timeLeftMs = Math.max(timeToWaitMs - waitedTimeMs, 0);

                         // Note that this results in a conservative estimate since an un-sendable partition may have

                         // a leader that will later be found to have sendable data. However, this is good enough

                         // since we'll just wake up and then sleep again for the remaining time.

                         // 目前还没有leader，下次重试

                         nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs);

                     }

                 }

             }

         }

     }

     return new ReadyCheckResult(readyNodes, nextReadyCheckDelayMs, unknownLeaderTopics);

 }

　　还有一个方法就是org.apache.kafka.clients.producer.internals.RecordAccumulator#drain，从accumulator缓冲区获取要发送的数据，最大一次性发max.request.size大小的数据（最上面的配置参数里有）：

 /**

  * Drain all the data for the given nodes and collate them into a list of batches that will fit within the specified

  * size on a per-node basis. This method attempts to avoid choosing the same topic-node over and over.

  *

  * @param cluster The current cluster metadata

  * @param nodes The list of node to drain

  * @param maxSize The maximum number of bytes to drain

  * maxSize也就是producer端配置参数max.request.size来控制的，一次最多发多少

  * @param now The current unix time in milliseconds

  * @return A list of {@link ProducerBatch} for each node specified with total size less than the requested maxSize.

  */

 public Map<Integer, List<ProducerBatch>> drain(Cluster cluster, Set<Node> nodes, int maxSize, long now) {

     if (nodes.isEmpty())

         return Collections.emptyMap();

     Map<Integer, List<ProducerBatch>> batches = new HashMap<>();

     for (Node node : nodes) {

         // for循环获取要发的batch

         List<ProducerBatch> ready = drainBatchesForOneNode(cluster, node, maxSize, now);

         batches.put(node.id(), ready);

     }

     return batches;

 }

 private List<ProducerBatch> drainBatchesForOneNode(Cluster cluster, Node node, int maxSize, long now) {

     int size = 0;

     // 获取node的partition

     List<PartitionInfo> parts = cluster.partitionsForNode(node.id());

     List<ProducerBatch> ready = new ArrayList<>();

     /* to make starvation less likely this loop doesn't start at 0 */

     // 避免每次都从一个partition取，要雨露均沾

     int start = drainIndex = drainIndex % parts.size();

     do {

         PartitionInfo part = parts.get(drainIndex);

         TopicPartition tp = new TopicPartition(part.topic(), part.partition());

         this.drainIndex = (this.drainIndex + 1) % parts.size();

         // Only proceed if the partition has no in-flight batches.

         if (isMuted(tp, now))

             continue;

         Deque<ProducerBatch> deque = getDeque(tp);

         if (deque == null)

             continue;

         // 加锁，不用说了吧

         synchronized (deque) {

             // invariant: !isMuted(tp,now) && deque != null

             ProducerBatch first = deque.peekFirst();

             if (first == null)

                 continue;

             // first != null

             // 查看是否在backoff期间

             boolean backoff = first.attempts() > 0 && first.waitedTimeMs(now) < retryBackoffMs;

             // Only drain the batch if it is not during backoff period.

             if (backoff)

                 continue;

             // 超过maxSize且ready里有东西

             if (size + first.estimatedSizeInBytes() > maxSize && !ready.isEmpty()) {

                 // there is a rare case that a single batch size is larger than the request size due to

                 // compression; in this case we will still eventually send this batch in a single request

                 // 有一种特殊的情况，batch的大小超过了maxSize，且batch是空的。也就是一个batch大小直接大于一次发送的maxSize

                 // 这种情况下最终还是会发送这个batch在一次请求

                 break;

             } else {

                 if (shouldStopDrainBatchesForPartition(first, tp))

                     break;

                 // 这块配置下面会讲

                 boolean isTransactional = transactionManager != null ? transactionManager.isTransactional() : false;

                 ProducerIdAndEpoch producerIdAndEpoch =

                     transactionManager != null ? transactionManager.producerIdAndEpoch() : null;

                 ProducerBatch batch = deque.pollFirst();

                 if (producerIdAndEpoch != null && !batch.hasSequence()) {

                     // If the batch already has an assigned sequence, then we should not change the producer id and

                     // sequence number, since this may introduce duplicates. In particular, the previous attempt

                     // may actually have been accepted, and if we change the producer id and sequence here, this

                     // attempt will also be accepted, causing a duplicate.

                     //

                     // Additionally, we update the next sequence number bound for the partition, and also have

                     // the transaction manager track the batch so as to ensure that sequence ordering is maintained

                     // even if we receive out of order responses.

                     batch.setProducerState(producerIdAndEpoch, transactionManager.sequenceNumber(batch.topicPartition), isTransactional);

                     transactionManager.incrementSequenceNumber(batch.topicPartition, batch.recordCount);

                     log.debug("Assigned producerId {} and producerEpoch {} to batch with base sequence " +

                             "{} being sent to partition {}", producerIdAndEpoch.producerId,

                         producerIdAndEpoch.epoch, batch.baseSequence(), tp);

                     transactionManager.addInFlightBatch(batch);

                 }

                 // 添加batch，并且close

                 batch.close();

                 size += batch.records().sizeInBytes();

                 ready.add(batch);

                 batch.drained(now);

             }

         }

     } while (start != drainIndex);

     return ready;

 }

三.幂等性producer

　　上面说到一个参数，enable.idempotence。0.11.0.0版本引入的幂等性producer表示它的发送操作是幂等的，也就是说，不会存在各种错误导致的重复消息。（比如说瞬时的发送错误可能导致producer端出现重试，同一个消息可能发送多次）

　　producer发送到broker端的每批消息都会有一个序列号（用于去重），Kakfa会把这个序列号存在底层日志，保存序列号只需要几个字节，开销很小。producer端会分配一个PID，对于PID、分区和序列号的关系，可以想象成一个哈希表，key就是（PID，分区），value就是序列号。比如第一次给broker发送((PID=1，分区=1),序列号=2)，第二次发送的value比2小或者等于2，则broker会拒绝PRODUCE请求，实现去重。

　　这个只能保证单个producer实例的EOS语义