本文介绍了在更新 Dataflow 管道时强制更新 SideInput的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Dataflow 管道正在运行,它获取活动租户的配置(存储在 GCS 中)并将其作为 sideInput 提供给 ActiveTenantFilter.配置很少更新,因此我决定在更新时使用 --update 标志重新部署管道.

I have a Dataflow pipeline running that fetches a configuration of active tenants (stored in GCS) and feeds it into an ActiveTenantFilter as a sideInput. The configuration is rarely updated, hence why I decided to re-deploy the pipeline, using the --update flag, whenever it is updated.

但是,当使用更新标志时,不会再次获取文件,即保持状态.是否可以强制在重新部署管道时更新此 PCollectionView?

However, when using the update flag, the file is not fetched again, i.e., the state is maintained. Is it possible to enforce that this PCollectionView is updated whenever the pipeline is re-deployed?

推荐答案

你是对的,当你 --update 管道时,它会处理新数据但不会重新加载旧数据.听起来您想要的是缓慢更新侧输入> 不幸的是尚未实施.您可以改为尝试排空并重新启动您的管道.

You are correct, when you --update a pipeline it will process new data but will not re-load old data. It sounds like what you want is slowly updating side inputs which unfortunately has not been implemented yet. You could instead try draining and re-starting your pipeline.

这篇关于在更新 Dataflow 管道时强制更新 SideInput的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 15:02