问题描述
我们正在使用预定义的Dataflow作业模板将Bigquery流插入与Dataflow一起使用.
We are using Bigquery streaming inserts with Dataflow using the predefined Dataflow job template.
将其与可为空且重复的字段一起使用时,会遇到一些特殊之处.
I ran into some peculiarities when using this with nullable and repeated fields.
例如,使用模式
name STRING, NULLABLE
尝试插入{name: null}
失败,并显示错误:
generic::invalid_argument: This field is not a record.","location":"name","message":"This field is not a record."
这没什么大不了的,因为它很容易简单地删除空字段,对于空数组也是如此.
This is not such a big deal since it's easy enough to simply drop null fields, and similarly for empty arrays.
但是,现在,如果我们的模式是:
However, now if our schema is:
name STRING, REPEATED
,而我们想插入["a", "b", null, "c"]
,则会得到类似的错误,它引用了第三个元素.
and we want to insert ["a", "b", null, "c"]
we get a similar error referencing the third element.
推荐答案
要为NULLABLE字段提供具有空值的行,只需从您要插入的行中省略该字段即可.对于第二个示例,REPEATED字段(或SQL术语为ARRAY)不能具有null元素.要为NULLABLE STRING数组建模,可以使用REPEATED RECORD,该记录包含一个名为value
的STRING字段,或者在SQL术语中等效为ARRAY<STRUCT<value STRING>>
.
To provide a row with a null value for a NULLABLE field, simply omit the field from the row that you are inserting. For your second example, a REPEATED field (or an ARRAY in SQL terms) cannot have a null element. To model an array of NULLABLE STRING, you can use a REPEATED RECORD that contains a STRING field named value
, for instance, or equivalently an ARRAY<STRUCT<value STRING>>
in SQL terms.
这篇关于BigQuery流插入使用具有空字段的数据流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!