问题描述
我们可以通过一次查询从Google App Engine中获得多少条记录,以便我们可以向用户显示计数,并且可以将超时限制从3秒增加到5秒
How many records we can get from google app engine from single query so that we can display count to user and is we can increase timeout limit 3 seconds to 5 seconds
推荐答案
以我的经验,ndb一次最多不能提取1000条记录.这是一个示例示例,如果我尝试在包含约500,000条记录的表上使用.count()
.
In my experience, ndb cannot pull more than 1000 records at a time. Here is an example of what happens if I try to use .count()
on a table that contains ~500,000 records.
s~project-id> models.Transaction.query().count()
WARNING:root:suspended generator _count_async(query.py:1330) raised AssertionError()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/utils.py", line 160, in positional_wrapper
return wrapped(*args, **kwds)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1287, in count
return self.count_async(limit, **q_options).get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/query.py", line 1330, in _count_async
batch = yield rpc
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/ndb/tasklets.py", line 513, in _on_rpc_completion
result = rpc.get_result()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 614, in get_result
return self.__get_result_hook(self)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_query.py", line 2910, in __query_result_hook
self._batch_shared.conn.check_rpc_success(rpc)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/datastore/datastore_rpc.py", line 1377, in check_rpc_success
rpc.check_success()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 580, in check_success
self.__rpc.CheckSuccess()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/api/apiproxy_rpc.py", line 157, in _WaitImpl
self.request, self.response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 308, in MakeSyncCall
handler(request, response)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/google_appengine/google/appengine/ext/remote_api/remote_api_stub.py", line 362, in _Dynamic_Next
assert next_request.offset() == 0
AssertionError
要通过此操作,您可以执行以下操作:
To by pass this, you can do something like:
objs = []
q = None
more = True
while more:
_objs, q, more = models.Transaction.query().fetch_page(300, start_cursor=q)
objs.extend(_objs)
但是,即使那样最终也会达到内存/超时限制.
But even that will eventually hit memory/timeout limits.
当前,我使用Google Dataflow预先计算这些值,并将结果作为模型DaySummaries
& StatsPerUser
Currently I use Google Dataflow to pre-compute these values and store the results in Datastore as the models DaySummaries
& StatsPerUser
snakecharmerb
是正确的.我可以在生产环境中使用.count()
,但是必须计算的实体越多,所需的时间就越长.这是我的日志查看器的屏幕截图,其中花了大约15秒才能计算大约330,000条记录
snakecharmerb
is correct. I was able to use .count()
in the production environment, but the more entities it has to count, the longer it seems to take. Here's a screenshot of my logs viewer where it took ~15 seconds to count ~330,000 records
当我尝试向该查询添加一个返回计数为〜4500的过滤器时,运行了大约一秒钟.
When I tried adding a filter to that query which returned a count of ~4500, it took about a second to run instead.
好吧,我还有另一个App Engine项目,该项目具有大约8,000,000条记录.我尝试在http请求处理程序中对此执行.count()
,并且在运行60秒后请求超时.
Ok I had another app engine project with a kind with ~8,000,000 records. I tried to do .count()
on that in my http request handler and the request timed-out after running for 60 seconds.
这篇关于google app引擎NDB记录NDB模型中的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!