官方文档:http://docs.sqlalchemy.org/en/rel_1_0/orm/basic_relationships.html#relationship-patterns

最近在学习到Flask中的Sqlalchemy, 不过在看到数据库关系db.relations()时对lazy这个参数一直很模糊。主要是看到Flask Web开发这本书中对关注与被关注的关系建模中,被lazy的使用绕晕了。

看官方文档,也得不到多少信息,于是就自己实践,从lazy参数的不同值所执行的sql语句出发,结合one-to-manymany-to-many的关系,分析lazy参数取不同值(dynamic, joined, select)在不同场景下的选择,因为涉及到数据库性能问题,选择不同差别很大,尤其在数据量比较大时。
以下的实例均是基于如下的模型和表:主要侧重对relationship中的backref的lazy属性做修改。

registrations = db.Table('registrations',
db.Column('student_id', db.Integer, db.ForeignKey('students.id')),
db.Column('class_id', db.Integer, db.ForeignKey('classes.id')))
class Student(db.Model):
__tablename__ = 'students'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(64))
class_id = db.Column(db.Integer, db.ForeignKey('classes.id'))
def __repr__(self):
return '<Student: %r>' %self.name
class Class(db.Model):
__tablename__ = 'classes'
id = db.Column(db.Integer, primary_key=True)
students = db.relationship('Student', backref='_class', lazy="dynamic")
name = db.Column(db.String(64))
def __repr__(self):
return '<Class: %r>' %self.name

基本介绍

首先看官网的关于lazy的说明:

通俗了说,select就是访问到属性的时候,就会全部加载该属性的数据。joined则是在对关联的两个表进行join操作,从而获取到所有相关的对象。dynamic则不一样,在访问属性的时候,并没有在内存中加载数据,而是返回一个query对象, 需要执行相应方法才可以获取对象,比如.all().下面结合实例解释这几个的使用场景。

实例

首先是最开始一对多关系中,改动如下:将的lazy改为select:

students = db.relationship('Student', backref='_class', lazy="select")

这样的话, class.students会直接返回结果列表:

>>> from app.models import Student as S, Class as C
>>> c1=C.query.first()
>>> c1.students
[<Student: u'test'>, <Student: u'test2'>, <Student: u'test3'>]

这种情况下,在数据量较大或者想做进一步操作时候,不太方便,因此这个时候, dynamic就用上了:

students = db.relationship('Student', backref='_class', lazy="dynamic")

同样看看结果:

>>> from app.models import Student as S, Class as C
>>> s1=S.query.first()
>>> c1=C.query.first()
>>> c1.students
<sqlalchemy.orm.dynamic.AppenderBaseQuery object at 0x7f007d2e8ed0>
>>> print c1.students
SELECT students.id AS students_id, students.name AS students_name
FROM students, registrations
WHERE :param_1 = registrations.class_id AND students.id = registrations.student_id
>>> c1.students.all()
[<Student: u'test'>, <Student: u'test2'>, <Student: u'test3'>]

可以看到, 执行c1.student返回的是是一个 query对象,并且该对象的sql语句也可以看到,就是简单查询了Student。而如果lazy=select 或者 joined均是直接返回结果。 需要注意的是, lazy="dynamic"只可以用在一对多和多对对关系中,不可以用在一对一和多对一中,如果返回结果只有一个的话,也就无需要延迟加载数据了。
前面说的都是给当前属性加lazy属性,backref的lazy默认都是select,如果给反向引用backref加lazy属性呢? 直接使用backref=db.backref('students', lazy='dynamic' 即可。这个在多对多关系需要进行考量。
先看一个最基本的多对多关系:

registrations = db.Table('registrations',
db.Column('student_id', db.Integer, db.ForeignKey('students.id')),
db.Column('class_id', db.Integer, db.ForeignKey('classes.id')))
class Student(db.Model):
__tablename__ = 'students'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(64))
# class_id = db.Column(db.Integer, db.ForeignKey('classes.id')) 这里需要注释,不需要外键了
def __repr__(self):
return '<Student: %r>' %self.name
class Class(db.Model):
__tablename__ = 'classes'
id = db.Column(db.Integer, primary_key=True)
students = db.relationship('Student', secondary=registrations, backref='_class', lazy="dynamic") #这里指定关联表
name = db.Column(db.String(64))
def __repr__(self):
return '<Class: %r>' %self.name

同样执行结果可以看到:

>>> s1=S.query.first()
>>> c1=C.query.first()
>>> s1._class
[<Class: u'class1'>, <Class: u'class2'>]
>>> c1.students
<sqlalchemy.orm.dynamic.AppenderBaseQuery object at 0x7ff8691a8610>
>>> c1.students.all()
[<Student: u'test'>, <Student: u'test2'>, <Student: u'test3'>]
>>> print c1.students
SELECT students.id AS students_id, students.name AS students_name
FROM students, registrations
WHERE :param_1 = registrations.class_id AND students.id = registrations.student_id

可以看到这个跟一对多关系中的很类似,只不过s1._class成为了集合形式, 因为backref="_class"默认仍然是select,所以直接返回结果,而c1.students的sql语句也仅仅是查询了students。但是如果修改反向引用的lazyjoined:

students = db.relationship('Student', secondary=registrations,
backref=db.backref('_class', lazy="joined"), lazy="dynamic")

然后看看结果:

....
>>> print c1.students
SELECT students.id AS students_id, students.name AS students_name, classes_1.id AS classes_1_id, classes_1.name AS classes_1_name
FROM registrations, students LEFT OUTER JOIN (registrations AS registrations_1 JOIN classes AS classes_1 ON classes_1.id = registrations_1.class_id) ON students.id = registrations_1.student_id
WHERE :param_1 = registrations.class_id AND students.id = registrations.student_id
>>> c1.students.all()
[<Student: u'test'>, <Student: u'test2'>, <Student: u'test3'>]
>>> s1._class
[<Class: u'class1'>, <Class: u'class2'>]

首先不变的还是s1._class还是直接返回数据。有变化的是c1.students的sql语句, 不仅仅是查询Student对象, 而且还通过与关联表做join操作,把相关联的Class也查询了。相关联的意思是什么呢?看下直接执行sql语句的结果就知道了:

mysql> SELECT students.id AS students_id, students.name AS students_name, classes_1.id AS classes_1_id, classes_1.name AS classes_1_name  FROM registrations, students LEFT OUTER JOIN (registrations AS registrations_1 JOIN classes AS classes_1 ON classes_1.id = registrations_1.class_id) ON students.id = registrations_1.student_id  WHERE 1 = registrations.class_id AND students.id = registrations.student_id;
+-------------+---------------+--------------+----------------+
| students_id | students_name | classes_1_id | classes_1_name |
+-------------+---------------+--------------+----------------+
| 1 | test | 1 | class1 |
| 1 | test | 2 | class2 |
| 2 | test2 | 1 | class1 |
| 3 | test3 | 1 | class1 |
+-------------+---------------+--------------+----------------+
4 rows in set (0.00 sec)

也就是说把查询得到的students的对应的class实体也都查询出来了。 但是貌似在这个例子中没有意义,因为这种多对多的关系比较简单,关联表甚至都不是模型,只有两个外键的id, 上述代码中的registrations是直接被sqlalchemy接管的,程序无法直接访问的。
在下面的多对多例子中,我们可以看到上述的lazy方式的优势,我们把关联表改为实体model,并且额外增加一个时间信息。模型代码如下:

class Registration(db.Model):
'''关联表'''
__tablename__ = 'registrations'
student_id = db.Column(db.Integer, db.ForeignKey('students.id'), primary_key=True)
class_id = db.Column(db.Integer, db.ForeignKey('classes.id'), primary_key=True)
create_at = db.Column(db.DateTime, default=datetime.utcnow)
class Student(db.Model):
__tablename__ = 'students'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(64))
_class = db.relationship('Registration', foreign_keys=[Registration.student_id],
backref=db.backref('student', lazy="joined"), lazy="dynamic")
def __repr__(self):
return '<Student: %r>' %self.name
class Class(db.Model):
__tablename__ = 'classes'
id = db.Column(db.Integer, primary_key=True)
students = db.relationship('Registration', foreign_keys=[Registration.class_id],
backref=db.backref('_class', lazy="joined"), lazy="dynamic")
name = db.Column(db.String(64))
def __repr__(self):
return '<Class: %r>' %self.name

提前准备数据:

mysql> select * from classes;
+----+--------+
| id | name |
+----+--------+
| 1 | class1 |
| 2 | class2 |
+----+--------+
2 rows in set (0.00 sec)
mysql> select * from students;
+----+-------+
| id | name |
+----+-------+
| 1 | test |
| 2 | test2 |
| 3 | test3 |
+----+-------+
3 rows in set (0.00 sec)
mysql> select * from registrations;
+------------+----------+-----------+
| student_id | class_id | create_at |
+------------+----------+-----------+
| 1 | 1 | NULL |
| 2 | 1 | NULL |
| 3 | 1 | NULL |
| 1 | 2 | NULL |
+------------+----------+-----------+
4 rows in set (0.00 sec)

之后看看结果:

>>> s1._class.all()
[<app.models.Registration object at 0x7f0348018ed0>, <app.models.Registration object at 0x7f0348018f50>]
>>> c1.students.all()
[<app.models.Registration object at 0x7f0348018ed0>, <app.models.Registration object at 0x7f03480412d0>, <app.models.Registration object at 0x7f034c32f250>]

可以看到返回值是Registration两个对象, 不再直接返回StudentClass对象了。如果想要获取的话,可以使用给Registration加的反向引用:

>>> map(lambda x: x._class, s1._class.all())
[<Class: u'class1'>, <Class: u'class2'>]
>>> map(lambda x: x.student, c1.students.all())
[<Student: u'test'>, <Student: u'test2'>, <Student: u'test3'>]

那么问题就来了,这里在调用Registration的_classstudent时候, 还需不需要再查询一遍数据库呢?

下面通过查看执行的sql语句来看看:

>>> print s1._class
SELECT registrations.student_id AS registrations_student_id, registrations.class_id AS registrations_class_id, registrations.create_at AS registrations_create_at, classes_1.id AS classes_1_id, classes_1.name AS classes_1_name, students_1.id AS students_1_id, students_1.name AS students_1_name
FROM registrations LEFT OUTER JOIN classes AS classes_1 ON classes_1.id = registrations.class_id LEFT OUTER JOIN students AS students_1 ON students_1.id = registrations.student_id
WHERE :param_1 = registrations.student_id

我们可以发现: 跟上一个例子一样,s1._class不仅查询了对应的class信息,而且通过join操作,获取到了相应的StudentClass对象,换句话说,把Registration的student_class两个回引属性均指向了对应的对象, 也就是说,s1._class这一条查询语句就可以把上述操作都完成。这个就是backref=db.backref('_class', lazy='joined')的作用。 
下面再看看把lazy改为select的情况:

###
_class = db.relationship('Registration', foreign_keys=[Registration.student_id],
backref=db.backref('student', lazy="select"), lazy="dynamic")
###
students = db.relationship('Registration', foreign_keys=[Registration.class_id],
backref=db.backref('_class', lazy="select"), lazy="dynamic")

这样看看查询语句:

>>> s1=S.query.first()
>>> print s1._class
SELECT registrations.student_id AS registrations_student_id, registrations.class_id AS registrations_class_id, registrations.create_at AS registrations_create_at
FROM registrations
WHERE :param_1 = registrations.student_id
>>> map(lambda x : x._class , s1._class)
[<Class: u'class1'>, <Class: u'class2'>]

十分简单的sql语句,仅仅查询返回了 Registration对象, 虽然结果一样,但是每一个Registration对象访问_class属性时,又各自都查询了一遍数据库! 这是很重的! 比如一个class有100个student, 那么获取class.students需要额外查询100次数据库! 每一次数据库的查询代价很大,因此这就是joined的作用了。

05-21 05:19
查看更多