问题描述
在构建带有多个批注的复合物 QuerySet
时,我遇到了一个问题,该问题可以通过以下简单设置重现。
While constructing a complexe QuerySet
with several annotations, I ran into an issue that I could reproduce with the following simple setup.
以下是模型:
class Player(models.Model):
name = models.CharField(max_length=200)
class Unit(models.Model):
player = models.ForeignKey(Player, on_delete=models.CASCADE,
related_name='unit_set')
rarity = models.IntegerField()
class Weapon(models.Model):
unit = models.ForeignKey(Unit, on_delete=models.CASCADE,
related_name='weapon_set')
通过测试数据库,我获得了以下内容(正确)结果:
With my test database, I obtain the following (correct) results:
Player.objects.annotate(weapon_count=Count('unit_set__weapon_set'))
[{'id': 1, 'name': 'James', 'weapon_count': 23},
{'id': 2, 'name': 'Max', 'weapon_count': 41},
{'id': 3, 'name': 'Bob', 'weapon_count': 26}]
Player.objects.annotate(rarity_sum=Sum('unit_set__rarity'))
[{'id': 1, 'name': 'James', 'rarity_sum': 42},
{'id': 2, 'name': 'Max', 'rarity_sum': 89},
{'id': 3, 'name': 'Bob', 'rarity_sum': 67}]
如果我现在将两个注释合并在同一个 QuerySet
,我得到了不同的(不准确的)结果:
If I now combine both annotations in the same QuerySet
, I obtain a different (inaccurate) results:
Player.objects.annotate(
weapon_count=Count('unit_set__weapon_set', distinct=True),
rarity_sum=Sum('unit_set__rarity'))
[{'id': 1, 'name': 'James', 'weapon_count': 23, 'rarity_sum': 99},
{'id': 2, 'name': 'Max', 'weapon_count': 41, 'rarity_sum': 183},
{'id': 3, 'name': 'Bob', 'weapon_count': 26, 'rarity_sum': 113}]
请注意 rarity_sum
现在如何具有不同的值 比以前。删除 distinct = True
不会影响结果。我还尝试使用DistinctSum 函数-values / 54278155#54278155>此答案,在这种情况下,所有 rarity_sum
都设置为 18
(也不准确)。
Notice how rarity_sum
have now different values than before. Removing distinct=True
does not affect the result. I also tried to use the DistinctSum
function from this answer, in which case all rarity_sum
are set to 18
(also inaccurate).
为什么?如何在同一个 QuerySet
中合并这两个注释?
Why is this? How can I combine both annotations in the same QuerySet
?
编辑:这是组合QuerySet生成的sqlite查询:
Edit: here is the sqlite query generated by the combined QuerySet:
SELECT "sandbox_player"."id",
"sandbox_player"."name",
COUNT(DISTINCT "sandbox_weapon"."id") AS "weapon_count",
SUM("sandbox_unit"."rarity") AS "rarity_sum"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"
用于以上结果的数据是。
The data used for the results above is available here.
推荐答案
这不是Django ORM的问题,这只是关系型数据库的方式ses工作。当您构建简单的查询集时,例如
This isn't the problem with Django ORM, this is just the way relational databases work. When you're constructing simple querysets like
Player.objects.annotate(weapon_count=Count('unit_set__weapon_set'))
或
Player.objects.annotate(rarity_sum=Sum('unit_set__rarity'))
ORM确实可以满足您的要求希望它能做到-将 Player
与武器
ORM does exactly what you expect it to do - join Player
with Weapon
SELECT "sandbox_player"."id", "sandbox_player"."name", COUNT("sandbox_weapon"."id") AS "weapon_count"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit"
ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
LEFT OUTER JOIN "sandbox_weapon"
ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"
或 Player
和 Unit
SELECT "sandbox_player"."id", "sandbox_player"."name", SUM("sandbox_unit"."rarity") AS "rarity_sum"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"
并执行 COUNT
或 SUM
聚合。
请注意,尽管第一个查询在三个表之间有两个联接,中间表 Unit
既不在 SELECT
引用的列中,也不在 GROUP BY
子句。 部队
在这里扮演的唯一角色是通过武器$ c $加入
玩家
c>。
Note that although the first query has two joins between three tables, the intermediate table Unit
is neither in columns referenced in SELECT
, nor in the GROUP BY
clause. The only role that Unit
plays here is to join Player
with Weapon
.
现在,如果您查看第三个查询集,情况将变得更加复杂。同样,在第一个查询中,联接位于三个表之间,但是现在在 SELECT
中引用了 Unit
SUM
用于 Unit.rarity
的聚合:
Now if you look at your third queryset, things get more complicated. Again, as in the first query the joins are between three tables, but now Unit
is referenced in SELECT
as there is SUM
aggregation for Unit.rarity
:
SELECT "sandbox_player"."id",
"sandbox_player"."name",
COUNT(DISTINCT "sandbox_weapon"."id") AS "weapon_count",
SUM("sandbox_unit"."rarity") AS "rarity_sum"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"
这是第二和第三查询之间的关键区别。在第二个查询中,您将 Player
加入 Unit
,因此只有一个 Unit
会针对其引用的每个玩家列出一次。
And this is the crucial difference between the second and the third queries. In the second query, you're joining Player
to Unit
, so a single Unit
will be listed once for each player that it references.
但是在第三个查询中,您要加入 Player
转换为 Unit
,然后将 Unit
转换为武器
,因此,不仅会为引用的每个玩家列出一个 Unit
一次,而且还会为引用 Unit
。
But in the third query you're joining Player
to Unit
and then Unit
to Weapon
, so not only a single Unit
will be listed once for each player that it references, but also for each weapon that references Unit
.
让我们看一个简单的示例:
Let's take a look at the simple example:
insert into sandbox_player values (1, "player_1");
insert into sandbox_unit values(1, 10, 1);
insert into sandbox_weapon values (1, 1), (2, 1);
一个玩家,一个单位和两个引用相同单位的武器。
One player, one unit and two weapons that reference the same unit.
确认问题存在:
>>> from sandbox.models import Player
>>> from django.db.models import Count, Sum
>>> Player.objects.annotate(weapon_count=Count('unit_set__weapon_set')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2}]>
>>> Player.objects.annotate(rarity_sum=Sum('unit_set__rarity')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'rarity_sum': 10}]>
>>> Player.objects.annotate(
... weapon_count=Count('unit_set__weapon_set', distinct=True),
... rarity_sum=Sum('unit_set__rarity')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2, 'rarity_sum': 20}]>
从此示例中可以很容易地看出问题是在组合查询中单位将被列出两次,每种武器都将被列出一次。
From this example it's easy to see that the problem is that in the combined query the unit will be listed twice, one time for each of the weapons referencing it:
sqlite> SELECT "sandbox_player"."id",
...> "sandbox_player"."name",
...> "sandbox_weapon"."id",
...> "sandbox_unit"."rarity"
...> FROM "sandbox_player"
...> LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
...> LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id");
id name id rarity
---------- ---------- ---------- ----------
1 player_1 1 10
1 player_1 2 10
如@ivissani所述,最简单的解决方案之一是为每个聚合编写子查询:
What should you do?
As @ivissani mentioned, one of the easiest solutions would be to write subqueries for each of the aggregations:
>>> from django.db.models import Count, IntegerField, OuterRef, Subquery, Sum
>>> weapon_count = Player.objects.annotate(weapon_count=Count('unit_set__weapon_set')).filter(pk=OuterRef('pk'))
>>> rarity_sum = Player.objects.annotate(rarity_sum=Sum('unit_set__rarity')).filter(pk=OuterRef('pk'))
>>> qs = Player.objects.annotate(
... weapon_count=Subquery(weapon_count.values('weapon_count'), output_field=IntegerField()),
... rarity_sum=Subquery(rarity_sum.values('rarity_sum'), output_field=IntegerField())
... )
>>> qs.values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2, 'rarity_sum': 10}]>
会产生以下SQL
SELECT "sandbox_player"."id", "sandbox_player"."name",
(
SELECT COUNT(U2."id") AS "weapon_count"
FROM "sandbox_player" U0
LEFT OUTER JOIN "sandbox_unit" U1
ON (U0."id" = U1."player_id")
LEFT OUTER JOIN "sandbox_weapon" U2
ON (U1."id" = U2."unit_id")
WHERE U0."id" = ("sandbox_player"."id")
GROUP BY U0."id", U0."name"
) AS "weapon_count",
(
SELECT SUM(U1."rarity") AS "rarity_sum"
FROM "sandbox_player" U0
LEFT OUTER JOIN "sandbox_unit" U1
ON (U0."id" = U1."player_id")
WHERE U0."id" = ("sandbox_player"."id")
GROUP BY U0."id", U0."name") AS "rarity_sum"
FROM "sandbox_player"
这篇关于Django Count和Sum批注相互干扰的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!