问题描述
我正在尝试使用批量更新语句更新表.DML 查询在 BigQuery 网页界面中成功执行,但批处理时,第一个查询成功,而其他查询失败.这是为什么?
I am trying to update a table using batched update statements. DML queries successfully execute in the BigQuery Web UI, but when batched, the first one succeeds while others fail. Why is this?
示例查询:
query = '''
update `project.dataset.Table`
set my_fk = 1234
where other_fk = 222 and
received >= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-22 05:28:12") and
received <= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-26 02:31:51")
'''
示例代码:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
queries = [] # list of DML Strings
jobs = []
for query in queries:
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)
作业输出:
for job in jobs[1:]:
print(job.state)
# Done
print(job.error_result)
# {'message': 'Cannot set destination table in jobs with DML statements',
# 'reason': 'invalidQuery'}
print(job.use_legacy_sql)
# False
print(job.job_type)
# Query
推荐答案
我怀疑问题是 job_config
通过 BigQuery 填充了一些字段(特别是 destination
)插入第一个作业后的 API.然后,第二个作业将失败,因为它将是作业配置中带有目标表的 DML 语句.您可以通过以下方式验证:
I suspect that the problem is job_config
getting some fields populated (destination
in particular) by the BigQuery API after the first job is inserted. Then, the second job will fail as it will be a DML statement with a destination table in the job configuration. You can verify that with:
for query in queries:
print(job_config.destination)
job = client.query(query, location='US', job_config=job_config)
print(job_config.destination)
jobs.append(job)
要解决这个问题,您可以避免对所有作业重复使用相同的 job_config
:
To solve this you can avoid reusing the same job_config
for all jobs:
for query in queries:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)
这篇关于BigQuery 更新失败,但仅当使用 Python API 进行批处理时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!