This IBM paper points out that and I quote第三,Beam编程指南保证每个用户定义的函数实例一次只能由一个线程执行.这意味着运行程序必须同步整个函数调用,这可能会导致严重的性能瓶颈." "Beam向应用程序保证一次将只有一个线程执行其用户定义的功能.因此,如果下划线引擎产生了多个线程,则运行程序必须同步整个DoFn或GroupByKey调用." 由于Beam禁止多个线程进入同一个PTransform实例,因此引擎失去了使用运算符并行性的机会.""Third, the Beam programming guide guarantee that each user-defined function instance will only be executed by a single thread at a time. This means that the runner has to synchronize the entire function invocation, which could lead to significant performance bottlenecks.""Beam promises applications that there will only be a single thread executing their user-defined functions at a time. Therefore, if the underline engine spawns multiple threads, the runner has to synchronize the entire DoFn or GroupByKey invocation.""As Beam forbids multiple threads from entering the same PTransform instance, engines lose the opportunity to use operator parallelism."该论文似乎表明整个DoFn调用是同步的.The paper seems to indicate that the entire DoFn invocation is synchronized.推荐答案我知道这是个老问题,但是由于我正在研究同一件事-不,您不需要为processElement进行同步,因为如您所引用:您的函数(DoFn)对象的每个实例都可以通过一个单独的线程在worker实例上进行访问"I know this is old question but since I was researching the same thing - no, you don't need synchronized for your processElement because as you quoted: "Each instance of your function (DoFn) object is accessed by a single thread at a time on a worker instance"这里是Beam的官方类的实例,该实例改变了实例变量"> https://github.com/apache/beam/blob/0c01636fc8610414859d946cb93eabc904123fc8/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L1369Here is example of beam's official class that mutates instance variablehttps://github.com/apache/beam/blob/0c01636fc8610414859d946cb93eabc904123fc8/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L1369 这篇关于Apache Beam中DoFn的线程同步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-20 06:05
查看更多