如何使用Trigger.Once选项在Spark 3 Structure Stream Kafka/Files源中配置反向压力

本文介绍了如何使用Trigger.Once选项在Spark 3 Structure Stream Kafka/Files源中配置反向压力的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Spark 3中，更改了Kafka上的反压选项，并且Trigger.once场景的文件源已更改.

但是我有一个问题.当我想使用TriggerOnce时如何为我的工作配置背压?

在spark 2.4中，我有一个用例，以回填一些数据，然后启动流.所以我只使用一次触发器，但是我的回填场景可能非常大，有时由于混洗和驱动程序内存而对磁盘造成过大的负载，因为FileIndex缓存在此.因此，我使用max maxOffsetsPerTrigger 和 maxFilesPerTrigger 来控制我的火花可以处理多少数据.这就是我配置背压的方式.

现在您删除了此功能，因此假设有人可以提出一条新的走法吗?

解决方案

Trigger.Once 现在会立即忽略这些选项(在Spark 3中)，因此它将始终在首次加载时读取所有内容./p>

您可以解决此问题-例如，您可以将触发器设置为定期启动流，并使用一些值(例如1小时)，并且不执行 .awaitTermination ，但是会有一个并行循环将检查第一批是否完成，并停止流.或者，您可以将其设置为连续模式，然后检查批处理是否具有0行，然后终止流.初始加载后，您可以将流切换回Trigger.Once

In Spark 3 Behave of backpressure option on Kafka and File Source for trigger.once scenario was changed.

But I have a question.How can I configure backpressure to my job when I want to use TriggerOnce?

In spark 2.4 I have a use case, to backfill some data and then start the stream.So I use trigger once, but my backfill scenario can be very very big and sometimes create too big a load on my disks because of shuffles and to driver memory because FileIndex cached there.SO I use max maxOffsetsPerTrigger and maxFilesPerTrigger to control how much data my spark can process. that's how I configure backpressure.

And now you remove this ability, so assume someone can suggest a new way to go?

解决方案

Trigger.Once ignores these options right now (in Spark 3), so it always will read everything on the first load.

You can workaround that - for example, you can start stream with trigger set to periodic, with some value like, 1 hour, and don't execute .awaitTermination, but have a parallel loop that will check if first batch is done, and stop the stream. Or you can set it to continuous mode, and then check if batches having 0 rows, and then terminate the stream. After that initial load you can switch stream back to Trigger.Once

这篇关于如何使用Trigger.Once选项在Spark 3 Structure Stream Kafka/Files源中配置反向压力的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

ONCE

如何使用Trigger.Once选项在Spark 3 Structure Stream Kafka/Files源中配置反向压力

问题描述