


I'm using cloud Dataflow to import data from Pub/Sub messages to BigQuery tables. I'm using DynamicDestinations since these messages can be put into different tables.


I've recently noticed that the process started consuming all resources and messages stating that the process is stuck started showing:

处理停留在步骤中,将avros写入BigQuery Table/StreamingInserts/StreamingWriteTables/StreamingWrite至少26h45m00s,而未在java.util.concurrent上的sun.misc.Unsafe.park(Native Method)输出或完成状态完成机构在java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)处的.locks.LockSupport.park(LockSupport.java:175)在组织处的java.util.concurrent.FutureTask.get(FutureTask.java:191)中.org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl $ DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:829)上的apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl $ DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:765))org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:103)上的org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:131)),网址为org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn $ DoFnInvoker.invokeFinishBundle(未知来源)


Currently, simply cancelling the pipeline and restarting it seems to temporarily solve the problem, but I can't seem to pinpoint the reason the process is getting stuck.


The pipeline is using beam-runners-google-cloud-dataflow-java version 2.8.0 and google-cloud-bigquery version 1.56.0



This log message may seem scary, but it is not indicative of a problem. What this message is trying to convey is that your pipeline has been performing the same operation for a while.


This is not necessarily a problem: Your files may be large enough that they take a while to write. If you've arrived at this question concerned that you're seeing these messages, please consider what kind of pipeline you've got, and whether it makes sense to think it may have some slow steps.


In your case, your pipeline has been writing for 26 HOURS, so this is certainly a problem. I believe the problem is related to a deadlock introduced by a library in older versions of Beam. This should not be a problem in more recent ones (e.g. 2.15.0).


10-30 14:59