谷歌云数据流-在BigQuery中批量插入

本文介绍了谷歌云数据流-在BigQuery中批量插入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我能够创建一个数据流管道，该管道从pub/sub中读取数据，并在处理后将其以流模式写入大型查询.

I was able to create a dataflow pipeline which reads data from pub/sub and after processing it writes to big query in streaming mode.

现在，我希望以批处理模式运行管道，而不是流模式，以降低成本.

Now instead of stream mode i would like to run my pipeline in batch mode to reduce the costs.

当前，我的管道正在使用动态目标在bigquery中进行流式插入.我想知道是否有一种方法可以对动态目标执行批量插入操作.

Currently my pipeline is doing streaming inserts in bigquery with dynamic destinations. I would like to know if there is a way to perform a batch insert operation with dynamic destinations.

下面是

public class StarterPipeline {
   public interface StarterPipelineOption extends PipelineOptions {

    /**
     * Set this required option to specify where to read the input.
     */
    @Description("Path of the file to read from")
    @Default.String(Constants.pubsub_event_pipeline_url)
    String getInputFile();

    void setInputFile(String value);

}

@SuppressWarnings("serial")
public static void main(String[] args) throws SocketTimeoutException {

    StarterPipelineOption options = PipelineOptionsFactory.fromArgs(args).withValidation()
            .as(StarterPipelineOption.class);

    Pipeline p = Pipeline.create(options);

    PCollection<String> datastream = p.apply("Read Events From Pubsub",
            PubsubIO.readStrings().fromSubscription(Constants.pubsub_event_pipeline_url));

    PCollection<String> windowed_items = datastream.apply(Window.<String>into(new GlobalWindows())
            .triggering(Repeatedly.forever(
                    AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(300))))
            .withAllowedLateness(Duration.standardDays(10)).discardingFiredPanes());

    // Write into Big Query
     windowed_items.apply("Read and make event table row", new
     ReadEventJson_bigquery())

     .apply("Write_events_to_BQ",
     BigQueryIO.writeTableRows().to(new DynamicDestinations<TableRow, String>() {
     public String getDestination(ValueInSingleWindow<TableRow> element) {
     String destination = EventSchemaBuilder
     .fetch_destination_based_on_event(element.getValue().get("event").toString());
     return destination;
     }

     @Override
     public TableDestination getTable(String table) {
     String destination =
     EventSchemaBuilder.fetch_table_name_based_on_event(table);
     return new TableDestination(destination, destination);
     }

     @Override
     public TableSchema getSchema(String table) {
     TableSchema table_schema =
     EventSchemaBuilder.fetch_table_schema_based_on_event(table);
     return table_schema;
     }
     }).withCreateDisposition(CreateDisposition.CREATE_NEVER)
     .withWriteDisposition(WriteDisposition.WRITE_APPEND)
     .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors()));

    p.run().waitUntilFinish();

    log.info("Events Pipeline Job Stopped");

}

}

在BigQuery中批量插入

谷歌云数据流-在BigQuery中批量插入

问题描述

推荐答案