问题描述
我正在尝试通过python客户端在Bigquery中创建一个表.该文档使用bigquery.SchemaField('name', 'TYPE')
定义一个字段.尽管对于 ARRAYS 或 STRUCTS 来说似乎不起作用.这是我要创建的STRUCTS的ARRAY字段:
I'm trying to create a table at Bigquery through python client. The docs uses bigquery.SchemaField('name', 'TYPE')
to define a field. Although it doesn't seem to work for ARRAYS or STRUCTS. This is the ARRAY of STRUCTS field I'm trying to create:
bigquery.SchemaField('owners', 'ARRAY<STRUCT<emailAddress STRING, displayName STRING>>', 'REPEATABLE'),
如果我使用上面的字段定义,则会收到以下API错误:
If I use the field definition above I get the following API error:
400 POST https://www.googleapis.com/bigquery/v2/projects/import-sheet/datasets/sheetgo/tables: Invalid value for: ARRAY<STRUCT<emailAddress STRING, displayName STRING>> is not a valid value
整个代码:
schema = [
bigquery.SchemaField('user', 'STRING'),
bigquery.SchemaField('id', 'STRING'),
bigquery.SchemaField('service_origin', 'STRING'),
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('mimeType', 'STRING'),
bigquery.SchemaField('createdAt', 'DATETIME'),
bigquery.SchemaField('ownedByMe', 'BOOLEAN'),
bigquery.SchemaField('owners', 'ARRAY<STRUCT<emailAddress STRING, displayName STRING>>', 'REPEATABLE'),
bigquery.SchemaField('parents', 'ARRAY<STRING>', 'REPEATABLE'),
bigquery.SchemaField('teamDriveId', 'STRING'),
bigquery.SchemaField('permissions', 'STRING'),
bigquery.SchemaField('shared', 'BOOLEAN'),
bigquery.SchemaField('writersCanShare', 'BOOLEAN'),
bigquery.SchemaField('sharingUser', 'STRING'),
bigquery.SchemaField('version', 'STRING'),
bigquery.SchemaField('size', 'FLOAT'),
bigquery.SchemaField('data_properties', 'ARRAY<STRUCT<'
'rows INTEGER,'
'cells_with_importrange ARRAY<'
'STRUCT<'
'row_index INTEGER,'
'col_index INTEGER,'
'importrange STRING'
'>'
'>,'
'tab_name STRING,'
'cell_count FLOAT,'
'header_rows ARRAY<STRING>,'
'>>', 'REPEATABLE'),
bigquery.SchemaField('timezone', 'STRING'),
bigquery.SchemaField('locale', 'STRING'),
bigquery.SchemaField('last_scansheet', 'STRING'),
]
bigquery_client = bigquery.Client(PROJECT_ID)
dataset_ref = bigquery_client.dataset("eita")
table_ref = dataset_ref.table(table_id)
table = bigquery.Table(table_ref, schema=schema)
table = bigquery_client.create_table(table)
更新
感谢 Willian Fuks ,我已经开始工作了.模式的最终结果如下所示:
UPDATE
Thanks to Willian Fuks, I got this working. The end result for the schema ended up like this:
schema = [
bigquery.SchemaField('user', 'STRING'),
bigquery.SchemaField('id', 'STRING'),
bigquery.SchemaField('service_origin', 'STRING'),
bigquery.SchemaField('name', 'STRING'),
bigquery.SchemaField('mimeType', 'STRING'),
bigquery.SchemaField('createdAt', 'DATETIME'),
bigquery.SchemaField('ownedByMe', 'BOOLEAN'),
bigquery.SchemaField('owners', 'RECORD', mode='REPEATED',
fields=(
bigquery.SchemaField('emailAddress', 'STRING'),
bigquery.SchemaField('displayName', 'STRING')
)
),
bigquery.SchemaField('parents', 'STRING', mode='REPEATED'),
bigquery.SchemaField('teamDriveId', 'STRING'),
bigquery.SchemaField('permissions', 'STRING'),
bigquery.SchemaField('shared', 'BOOLEAN'),
bigquery.SchemaField('writersCanShare', 'BOOLEAN'),
bigquery.SchemaField('sharingUser', 'STRING'),
bigquery.SchemaField('version', 'STRING'),
bigquery.SchemaField('size', 'FLOAT'),
bigquery.SchemaField('data_properties', 'RECORD', mode='REPEATED',
fields=(
bigquery.SchemaField('rows', 'INTEGER'),
bigquery.SchemaField('cells_with_importrange', 'RECORD', mode='REPEATED',
fields=(
bigquery.SchemaField('row_index', 'INTEGER'),
bigquery.SchemaField('col_index', 'INTEGER'),
bigquery.SchemaField('importrange', 'STRING'),
)
),
bigquery.SchemaField('tab_name', 'STRING'),
bigquery.SchemaField('cell_count', 'FLOAT'),
bigquery.SchemaField('header_rows', 'STRING', mode='REPEATED')
)
),
bigquery.SchemaField('timezone', 'STRING'),
bigquery.SchemaField('locale', 'STRING'),
bigquery.SchemaField('last_scansheet', 'STRING'),
]
推荐答案
SchemaField
的构造函数合同确实从您使用的输入中期望不同的输入.
The constructor's contract for SchemaField
does expect different inputs from the ones you used.
尝试以下方法:
schema = [
(...),
SchemaField('owners', 'RECORD', mode='REPEATED',
fields=(SchemaField('emailAddress', 'STRING'),
SchemaField('displayName', 'STRING')
)
),
(...)
]
主要思想是通过使用其他SchemaField
定义在记录字段内部定义字段.
Main idea is to define fields inside of a record field by using other SchemaField
definitions.
这篇关于Bigquery python SchemaField(),带有数组的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!