关系指数实体和投影查询

关系指数实体和投影查询

本文介绍了关系指数实体和投影查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为经典的用户帖子和标签问题设计Google数据存储架构。



建议关系指数实体模型。基本上它将可搜索的标签或关键字放置在子实体中用于过滤的列表属性以及父实体中的必要属性。据我了解,这种方法是在查询时减少序列化开销。

  class Post(db.Model):
title = db.StringProperty()
post_date = db .DateTimeProperty()

class标签(db.Model):
tags = db.StringListProperty()

mytags =标签(parent = post,tags = many_tags )




  1. 鉴于可以获得属性的一个子集,关系索引实体对于减少列表属性的序列化开销仍然是必需的吗?

注意:投影查询具有;关系索引实体没有。


  1. 关系索引实体是否使查询更加困难?说我想过滤在过去7天内创建的帖子的标签'汽车'的帖子。标签和post_date是不同的,有没有一种简单的方法来做到这一点?

  2. 关于爆炸指数,关系指数实体减少了爆炸指数的机会,因为它把不同类型的列表属性?


    感谢您提前回答。


    关系索引实体解决方案减少了任何类型访问的串行化开销。 Post 实体,包括像 key.get(),`entity.put()或读取非投影查询,而投影查询只是为了获取相应的查询结果。 是的,查询有点困难。对于你的例子,你需要单独的查询,每个实体类型一个。 $ c> ndb ,而不是 db

      from google.appengine.ext import ndb 

    class Post(ndb.Model):
    title = ndb.StringProperty()
    post_date = ndb.DateTimeProperty()

    class标记(ndb.Model):
    tags = ndb.StringProperty(repeated = True)

    我会使用只有键的查询,因为它们更便宜,更快捷:

      from datetime import datetime,timedelta 

    car_post_keys = []
    post_keys = Post.query(Post.post_date>(datetime.utcnow() - timedelta(days = 7)),
    keys_only = True).fetch()
    如果post_keys不是None:
    用于post_keys中的post_key:
    如果Tags.query(Tags.tags =='car',ancestor = post_key,keys_only = True ).fetch()不是None:
    car_post_keys.append(post_key)

    car_p osts = ndb.get_multi(car_post_keys)if car_post_keys else []




    1. 一般来说,答案是肯定的,正因为你提到的原因。在您的特定示例中,只有一个属性具有多个值 - tags - 和少量其他 Post 属性,所有这些都有单一的值,所以爆炸式索引影响的差异可能会被忽略。

    将实体拆分成几个较小的实体是常见的技术也用于其他原因,例如,请参阅。 下面是一个应用这个想法的例子这里:

      car_post_key_ids = [] 

    post_keys = Post.query(Post.post_date>(datetime .utcnow() - timedelta(days = 7)),
    keys_only = True).fetch()
    如果post_keys不是None:
    post_key_ids = [key.id()for key in post_keys]

    car_tag_keys = Tags.query(Tags.ta gs =='car',keys_only = True).fetch()
    car_tag_key_ids = [key_id()用于car_tag_keys中的键)如果car_tag_keys不是None其他[]

    car_post_key_ids =列表(set(post_key_ids)& set(car_tag_key_ids))

    car_posts = [在car_post_key_ids中为id的Post.get_by_id(id)]

    这些例子相当简单,可以使用 ndb 异步调用,任务/ tasklet,许多结果需要使用游标等进行优化。


    I am designing google datastore schema for the classic 'User Posts' and 'Tags' questions.

    This page suggests Relation Index Entities model. Basically it puts searchable tags or keywords as list property in child entity for filtering, and the necessary properties in parent entity. To my understanding, this approach is to reduce serialization overhead at query time.

    class Post(db.Model):
      title = db.StringProperty()
      post_date = db.DateTimeProperty()
    
    class Tags(db.Model):
      tags = db.StringListProperty()
    
    mytags = Tags(parent=post, tags=many_tags)
    
    1. Given projection queries can get a subset of properties, is Relation Index Entities still necessary to reduce serialization overhead of list properties?

    Note: projection query has limits; Relation Index Entity doesn't.

    1. Does Relation Index Entities make query more difficult? Saying I want to filter on the post with tag 'cars' for the posts created within last 7 days. tags and post_date are in different kinds, is there an easy way to do that?

    2. Regarding exploding indexes, does Relation Index Entities reduce the chance of exploding indexes, since it put list properties in different kinds?

    Thanks for answering in advance.

    解决方案
    1. The Relation Index Entity solution reduces the serialization overhead at any type of access to the Post entities, including ops like key.get(), `entity.put() or fetching non-projection queries, while projection queries only do that for, well, fetching the respective query results.

    2. Yes, queries are a bit more difficult. For your example you'll need separate queries, one for each entity kind.

    The example assumes using ndb, not db:

    from google.appengine.ext import ndb
    
    class Post(ndb.Model):
      title = ndb.StringProperty()
      post_date = ndb.DateTimeProperty()
    
    class Tags(ndb.Model):
      tags = ndb.StringProperty(repeated=True)
    

    I'd use keys-only queries as they're cheaper and faster:

    from datetime import datetime, timedelta
    
    car_post_keys = []
    post_keys = Post.query(Post.post_date>(datetime.utcnow() - timedelta(days=7)),
                           keys_only=True).fetch()
    if post_keys is not None:
        for post_key in post_keys:
            if Tags.query(Tags.tags=='car', ancestor=post_key, keys_only=True).fetch() is not None:
                car_post_keys.append(post_key)
    
    car_posts = ndb.get_multi(car_post_keys) if car_post_keys else []
    
    1. In general the answer would be yes, exactly for the reason you mention. In your particular example there is only one property with multiple values - tags - and a small number of other Post properties, all with single values, so the difference in exploding index impact would probably be neglijible.

    Splitting an entity in several smaller ones is a common technique used for other reasons as well, see, for example, re-using an entity's ID for other entities of different kinds - sane idea?.

    Here's an example of applying this idea here:

    car_post_key_ids = []
    
    post_keys = Post.query(Post.post_date>(datetime.utcnow() - timedelta(days=7)),
                           keys_only=True).fetch()
    if post_keys is not None:
        post_key_ids = [key.id() for key in post_keys]
    
        car_tag_keys = Tags.query(Tags.tags=='car', keys_only=True).fetch()
        car_tag_key_ids = [key.id() for key in car_tag_keys] if car_tag_keys is not None else []
    
        car_post_key_ids = list(set(post_key_ids) & set(car_tag_key_ids))
    
    car_posts = [Post.get_by_id(id) for id in car_post_key_ids]
    

    The examples are rather simplistic, they can be optimized using ndb async calls, tasks/tasklets, cursors may be needed for many results, etc.

    这篇关于关系指数实体和投影查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 01:36