问题描述
给出以下表格:
--- player --
id serial
name VARCHAR(100)
birthday DATE
country VARCHAR(3)
PRIMARY KEY id
--- club ---
id SERIAL
name VARCHAR(100)
country VARCHAR(3)
PRIMARY KEY id
--- playersinclubs ---
id SERIAL
player_id INTEGER (with INDEX)
club_id INTEGER (with INDEX)
joined DATE
left DATE
PRIMARY KEY id
每个玩家在桌面播放器中都有一行(带有他的属性)。同样,每个俱乐部都有一个桌俱乐部入口。
对于他职业生涯中的每个电台,玩家都有一个参赛选手参加俱乐部(nm),其中包括玩家加入的日期以及玩家离开俱乐部时的选择。
Every player has a row in table player (with his attributes). Equally every club has an entry in table club.For every station in his career, a player has an entry in table playersInClubs (n-m) with the date when the player joined and optionally when the player left the club.
我的主要问题是这些表的性能。在桌面播放器中,我们有超过1000万条目。如果我想显示一个俱乐部的历史记录,其中所有球员都为这个俱乐部效力,我的选择如下:
My main problem is the performance of these tables. In Table player we have over 10 million entries. If i want to display a history of a club with all his players played for this club, my select looks like the following:
SELECT * FROM player
JOIN playersinclubs ON player.id = playersinclubs.player_id
JOIN club ON club.id = playersinclubs.club_id
WHERE club.dbid = 3;
但是对于大量玩家来说,桌面播放器上的序列扫描将被执行。这个选择花费了很多时间。
But for the massive load of players a sequence scan on table player will be executed. This selection takes a lot of time.
在我为我的应用程序实现一些新功能之前,每个玩家只有一个团队(只有今天的团队和玩家)。
所以我没有表玩家俱乐部。相反,我在桌面播放器中有一个team_id。我可以直接在表格播放器中使用where子句team_id = 3选择团队的玩家。
Before I implemented some new functions to my app, every players has exactly one team (only todays teams and players).So i havn't had the table playersinclubs. Instead i had a team_id in table player. I could select the players of a team directly in table player with the where clause team_id = 3.
是否有人为我的数据库结构提供一些性能提示以加快这些选择?
Does someone has some performance tips for my database structure to speed up these selections?
推荐答案
最重要的是,你需要一个索引 playersinclubs(club_id,player_id)
即可。剩下的就是细节(这可能会产生很大的不同)。
您需要准确了解自己的实际目标。你写道:
Most importantly, you need an index on playersinclubs(club_id, player_id)
. The rest is details (that may still make quite a difference).
You need to be precise about your actual goals. You write:
您根本不需要加入 club
:
SELECT p.*
FROM playersinclubs pc
JOIN player p ON p.id = pc.player_id
WHERE pc.club_id = 3;
你不需要列 playersinclubs
在输出中,这是一个很小的性能增益 - 除非它允许 playersinclubs
上的仅索引扫描,那么它可能是实质性的。
And you don't need columns playersinclubs
in the output either, which is a small gain for performance - unless it allows an index-only scan on playersinclubs
, then it may be substantial.
- How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?
您可能不需要所有列的播放器
结果,或者。只有 SELECT
你真正需要的列。
You probably don't need all columns of player
in the result, either. Only SELECT
the columns you actually need.
播放器上的PK
提供该表所需的索引。
The PK on player
provides the index you need on that table.
你需要一个的指数玩家俱乐部(club_id,player_id)
,但不使其独一无二,除非玩家不允许第二次加入同一个俱乐部。
You need an index on playersinclubs(club_id, player_id)
, but do not make it unique unless players are not allowed to join the same club a second time.
如果玩家可以多次加入而你只需要一个所有玩家列表,你还需要添加一个 DISTINCT
步骤来折叠重复的条目。您可以:
If players can join multiple times and you just want a list of "all players", you also need to add a DISTINCT
step to fold duplicate entries. You could just:
SELECT DISTINCT p.* ...
但是因为你正试图优化性能:早期消除欺骗的成本更低:
But since you are trying to optimize performance: it's cheaper to eliminate dupes early:
SELECT p.*
FROM (
SELECT DISTINCT player_id
FROM playersinclubs
WHERE club_id = 3;
) pc
JOIN player p ON p.id = pc.player_id;
也许你真的想要所有条目 playersinclubs
以及该表的所有列。但你的描述不然。查询和索引会有所不同。
Maybe you really want all entries in playersinclubs
and all columns of the table, too. But your description says otherwise. Query and indexes would be different.
密切相关的答案:
- Find overlapping date ranges in PostgreSQL
这篇关于多对多表 - 性能不好的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!