我有一个PostgreSQL数据库,它具有多对多用户标记关系,具有以下表:
社交用户:用户信息
标记:标记信息
用户标签:社交用户和标签之间的多对多关系
我正在尝试构建一个简单的API来使用Flask、Peewee和Marshmallow访问数据库中的数据。我们现在可以忽略Flask,但我正在尝试为social_用户创建一个模式,该模式允许我转储一个查询,该查询返回一个或多个具有各自标记的用户。我正在寻找类似于以下内容的响应:

{
    "id": "[ID]",
    "handle": "[HANDLE]",
    "local_id": "[LOCAL_ID]",
    "platform_slug": "[PLATFORM_SLUG]",
    "tags": [
        {
            "id": "[ID]",
            "title": "[TITLE]",
            "tag_type": "[TAG_TYPE]"
        },
        {
            "id": "[ID]",
            "title": "[TITLE]",
            "tag_type": "[TAG_TYPE]"
        }
    ]
}

我通过在@post_dump wrapped函数中包含第二个查询来实现这一点,该查询会将social_用户模式中的标记拉入到@post_dump wrapped函数中,然而,这感觉像是一次黑客攻击,而且对于大量用户来说似乎会很慢(更新:这非常慢,我在369个用户上进行了测试)。我想我可以用棉花糖做点什么。有没有更好的方法可以用一个Peewee查询序列化这个关系?我的代码如下:
# just so you are aware of my namespaces
import marshmallow as marsh
import peewee as pw

皮威模型
db = postgres_ext.PostgresqlExtDatabase(
    register_hstore = False,
    **json.load(open('postgres.json'))
)

class Base_Model(pw.Model):
    class Meta:
        database = db

class Tag(Base_Model):
    title = pw.CharField()
    tag_type = pw.CharField(db_column = 'type')

    class Meta:
        db_table = 'tag'

class Social_User(Base_Model):
    handle = pw.CharField(null = True)
    local_id = pw.CharField()
    platform_slug = pw.CharField()

    class Meta:
        db_table = 'social_user'

class User_Tag(Base_Model):
    social_user_id = pw.ForeignKeyField(Social_User)
    tag_id = pw.ForeignKeyField(Tag)

    class Meta:
        primary_key = pw.CompositeKey('social_user_id', 'tag_id')
        db_table = 'user_tag'

棉花糖模式
class Tag_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    title = marsh.fields.Str(required = True)
    tag_type = marsh.fields.Str(required = True, default = 'descriptive')

class Social_User_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    local_id = marsh.fields.Str(required = True)
    handle = marsh.fields.Str()
    platform_slug = marsh.fields.Str(required = True)
    tags = marsh.fields.Nested(Tag_Schema, many = True, dump_only = True)

    def _get_tags(self, user_id):
        query = Tag.select().join(User_Tag).where(User_Tag.social_user_id == user_id)
        tags, errors = tags_schema.dump(query)
        return tags

    @marsh.post_dump(pass_many = True)
    def post_dump(self, data, many):
        if many:
            for datum in data:
                datum['tags'] = self._get_tags(datum['id']) if datum['id'] else []
        else:
            data['tags'] = self._get_tags(data['id'])
        return data

user_schema = Social_User_Schema()
users_schema = Social_User_Schema(many = True)
tags_schema = Tag_Schema(many = True)

下面是一些测试来演示功能:
db.connect()
query = Social_User.get(Social_User.id == 825)
result, errors = user_schema.dump(query)
db.close()
pprint(result)

{'handle': 'test',
 'id': 825,
 'local_id': 'test',
 'platform_slug': 'tw',
 'tags': [{'id': 20, 'tag_type': 'descriptive', 'title': 'this'},
          {'id': 21, 'tag_type': 'descriptive', 'title': 'that'}]}

db.connect()
query = Social_User.select().where(Social_User.platform_slug == 'tw')
result, errors = users_schema.dump(query)
db.close()
pprint(result)

[{'handle': 'test',
  'id': 825,
  'local_id': 'test',
  'platform_slug': 'tw',
  'tags': [{'id': 20, 'tag_type': 'descriptive', 'title': 'this'},
           {'id': 21, 'tag_type': 'descriptive', 'title': 'that'}]},
 {'handle': 'test2',
  'id': 826,
  'local_id': 'test2',
  'platform_slug': 'tw',
  'tags': []}]

最佳答案

看起来这可以使用Peewee模型中的ManyToMany field来完成,并手动设置through_modelManyToMany字段允许您向模型中添加一个将两个表相互关联的字段,通常它会自动创建关系表(through_model)本身,但您可以手动设置它。
我正在使用3.0 alpha of Peewee,但我确信许多人正在使用当前的稳定版本,所以我将包括这两个版本。我们将使用一个DeferredThroughModel对象和ManyToMany字段,在Peewee 2.x中,它们位于3.x中的“playhouse”中,它们是Peewee主版本的一部分。我们还将删除@post_dump包装函数:
皮威模型

# Peewee 2.x
# from playhouse import fields
# User_Tag_Proxy = fields.DeferredThroughModel()

# Peewee 3.x
User_Tag_Proxy = pw.DeferredThroughModel()

class Tag(Base_Model):
    title = pw.CharField()
    tag_type = pw.CharField(db_column = 'type')

    class Meta:
        db_table = 'tag'

class Social_User(Base_Model):
    handle = pw.CharField(null = True)
    local_id = pw.CharField()
    platform_slug = pw.CharField()
    # Peewee 2.x
    # tags = fields.ManyToManyField(Tag, related_name = 'users', through_model = User_Tag_Proxy)

    # Peewee 3.x
    tags = pw.ManyToManyField(Tag, backref = 'users', through_model = User_Tag_Proxy)

    class Meta:
        db_table = 'social_user'

class User_Tag(Base_Model):
    social_user = pw.ForeignKeyField(Social_User, db_column = 'social_user_id')
    tag = pw.ForeignKeyField(Tag, db_column = 'tag_id')

    class Meta:
        primary_key = pw.CompositeKey('social_user', 'tag')
        db_table = 'user_tag'

User_Tag_Proxy.set_model(User_Tag)

棉花糖模式
class Social_User_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    local_id = marsh.fields.Str(required = True)
    handle = marsh.fields.Str()
    platform_slug = marsh.fields.Str(required = True)
    tags = marsh.fields.Nested(Tag_Schema, many = True, dump_only = True)

user_schema = Social_User_Schema()
users_schema = Social_User_Schema(many = True)

实际上,它的工作原理与使用@post_dump包装函数完全相同。不幸的是,虽然这看起来是解决这个问题的“正确”方法,但实际上速度稍慢。
--更新--
我已经设法在100分之一的时间内完成了同样的事情。这是一个小黑客,可能需要一些清理,但它的工作!在将数据传递给模式进行序列化之前,我没有对模型进行更改,而是调整了收集和处理数据的方式。
皮威模型
class Tag(Base_Model):
    title = pw.CharField()
    tag_type = pw.CharField(db_column = 'type')

    class Meta:
        db_table = 'tag'

class Social_User(Base_Model):
    handle = pw.CharField(null = True)
    local_id = pw.CharField()
    platform_slug = pw.CharField()

    class Meta:
        db_table = 'social_user'

class User_Tag(Base_Model):
    social_user = pw.ForeignKeyField(Social_User, db_column = 'social_user_id')
    tag = pw.ForeignKeyField(Tag, db_column = 'tag_id')

    class Meta:
        primary_key = pw.CompositeKey('social_user', 'tag')
        db_table = 'user_tag'

棉花糖模式
class Social_User_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    local_id = marsh.fields.Str(required = True)
    handle = marsh.fields.Str()
    platform_slug = marsh.fields.Str(required = True)
    tags = marsh.fields.Nested(Tag_Schema, many = True, dump_only = True)

user_schema = Social_User_Schema()
users_schema = Social_User_Schema(many = True)

查询
对于新的查询,我们将加入(LEFT_OUTER)三个表(Social_User、Tag和User_Tag),其中Social_User是我们的真实来源。我们要确保我们得到每个用户,无论他们是否有标签。这将根据用户拥有的标记数多次返回用户,因此我们需要通过遍历每个标记并使用字典存储对象来减少返回次数。在这些新的Social_User对象中,每个对象都将添加一个tags列表,我们将在其中附加Tag对象。
db.connect()
query = (Social_User.select(User_Tag, Social_User, Tag)
    .join(User_Tag, pw.JOIN.LEFT_OUTER)
    .join(Tag, pw.JOIN.LEFT_OUTER)
    .order_by(Social_User.id))

users = {}
last = None
for result in query:
    user_id = result.id
    if (user_id not in users):
        # creates a new Social_User object matching the user data
        users[user_id] = Social_User(**result.__data__)
        users[user_id].tags = []
    try:
        # extracts the associated tag
        users[user_id].tags.append(result.user_tag.tag)
    except AttributeError:
        pass

result, errors = users_schema.dump(users.values())
db.close()
pprint(result)

关于python - 序列化带有Peewee和棉花糖的多对多关系,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48452604/

10-13 01:59