问题描述
我已经使用 BigQuery 大约 2 个月了.在那段时间里,我使用流式插入每分钟添加数千个条目.我已经能够在几分钟内(如果不是立即)查询该数据.
I've been using BigQuery for about 2 months. During that time I've used streaming insertion to add thousands of entries every minute. I've been able to then query over that data within a few minutes, if not practically instantly.
从几天前开始,我的一个表突然开始显示数据可用性延迟 20 到 60 分钟.这只发生在我的一张桌子上.插入到其他表中的数据几乎立即可用.
Starting a few days ago though, one of my tables suddenly starting showing delays in data availability ranging from 20 to 60 minutes. This only occurs with one of my tables. Data inserted into other tables remain available nearly instantly.
这种数据可用性延迟对于 BigQuery 来说是否正常?
遇到此问题的表是accuAudience.trackPlays
.我很乐意向 Google 团队成员提供项目 ID 和其他信息.
The table experiencing this problem is accuAudience.trackPlays
. I will gladly provide project ID and other info to a Google team member.
流式插入到有问题的表中的结果是:
The results of the streaming inserts into the problematic table are:
{'kind': 'bigquery#tableDataInsertAllResponse'}
来自有问题的表的示例查询,accuAudience.trackPlays
(按日期 desc 排序):
Example query from problematic table, accuAudience.trackPlays
(ordered by date desc):
行日期计数
1 2015-03-30 12:35:32 UTC 67
2 2015-03-30 12:35:31 UTC 65
3 2015-03-30 12:35:30 UTC 56
4 2015-03-30 12:35:29 UTC 45
5 2015-03-30 12:35:28 UTC 60
几秒钟后对不同的表(accuAudience.trackSkips
)进行了相同的查询.请注意,日期字段比之前的查询早 30 分钟.
Same query made seconds later to different table (accuAudience.trackSkips
). Note the date field is 30 minutes ahead of the earlier query.
行日期计数
1 2015-03-30 13:04:03 UTC 1
2 2015-03-30 13:04:02 UTC 1
3 2015-03-30 13:04:01 UTC 3
4 2015-03-30 13:04:00 UTC 3
5 2015-03-30 13:03:59 UTC 6
如果需要其他信息,请告诉我!
If there's other information needed, please let me know!
推荐答案
BigQuery 会定期运行后台维护任务来优化您的表以进行查询.其中一项后台任务导致流处理过程出现问题.这导致我们无法从流缓冲区读取,直到它被刷新.请注意,您可能已经将其视为一个持续存在的问题,而您则是在不断地将其流式传输到表中.
BigQuery periodically runs background maintenance tasks to optimize your tables for querying. One of these background tasks caused a hiccup with the streaming process. This caused us to not be able to read from the streaming buffer until it was flushed. Note that you might have seen this as an ongoing issue while you were continually streaming to the table.
现在已经修复了.如果您仍然看到问题,请告诉我们什么表和;您遇到问题的项目.
It is fixed now. If you continue to see the problem, please let us know what table & project you are seeing the issue with.
这篇关于BigQuery 流式插入数据可用性延迟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!