本文介绍了通过Google Cloud Dataflow创建/写入Parititoned BigQuery表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想利用时间分区表的新BigQuery功能,但我不确定目前在1.6版Dataflow SDK中是否可以使用这个功能。



在中创建一天分区表需要传入

 timePartitioning:{type:DAY} 

选项,但com.google.cloud.dataflow.sdk.io.BigQueryIO接口只允许指定TableReference。



我想也许我可以预先创建表格,然后通过BigQueryIO.Write.toTableReference lambda ..在分区装饰器中潜行。是否有其他人通过Dataflow创建/编写分区表成功?



这似乎与设置,目前还不可用。 解决方案正如Pavan所说,使用Dataflow编写分区表格是绝对有可能的。您是否正在使用以流模式或批处理模式操作的 DataflowPipelineRunner



您提出的解决方案应该可行。特别是,如果您预先创建了一个包含日期分区的表,那么您可以使用 BigQueryIO.Write.toTableReference lambda写入日期分区。例如:

  / ** 
*一个Joda时间格式化程序,日期格式为{@code20160101}。
* Threadsafe。
* /
private static final DateTimeFormatter FORMATTER =
DateTimeFormat.forPattern(yyyyMMdd)。withZone(DateTimeZone.UTC);

//此代码生成一个有效的BigQuery分区名称:
Instant instant = Instant.now(); //在合理时间范围内的任何Joda即时
String baseTableName =project:dataset.table; //一个有效的BigQuery表名
String partitionName =
String.format(%s $%s,baseTableName,FORMATTER.print(instant));


I wanted to take advantage of the new BigQuery functionality of time partitioned tables, but am unsure this is currently possible in the 1.6 version of the Dataflow SDK.

Looking at the BigQuery JSON API, to create a day partitioned table one needs to pass in a

"timePartitioning": { "type": "DAY" }

option, but the com.google.cloud.dataflow.sdk.io.BigQueryIO interface only allows specifying a TableReference.

I thought that maybe I could pre-create the table, and sneak in a partition decorator via a BigQueryIO.Write.toTableReference lambda..? Is anyone else having success with creating/writing partitioned tables via Dataflow?

This seems like a similar issue to setting the table expiration time which isn't currently available either.

解决方案

As Pavan says, it is definitely possible to write to partition tables with Dataflow. Are you using the DataflowPipelineRunner operating in streaming mode or batch mode?

The solution you proposed should work. Specifically, if you pre-create a table with date partitioning set up, then you can use a BigQueryIO.Write.toTableReference lambda to write to a date partition. For example:

/**
 * A Joda-time formatter that prints a date in format like {@code "20160101"}.
 * Threadsafe.
 */
private static final DateTimeFormatter FORMATTER =
    DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC);

// This code generates a valid BigQuery partition name:
Instant instant = Instant.now(); // any Joda instant in a reasonable time range
String baseTableName = "project:dataset.table"; // a valid BigQuery table name
String partitionName =
    String.format("%s$%s", baseTableName, FORMATTER.print(instant));

这篇关于通过Google Cloud Dataflow创建/写入Parititoned BigQuery表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 16:37