问题描述
就像许多Postgres n00bs
一样,我们有很多带有外键约束的表未编入索引.在某些情况下,这应该不会对性能造成重大影响-但这有待进一步分析.
Like many Postgres n00bs
we have a lot of tables with foreign key constraints that are not indexed. I some cases this should not be a big performance hit - but this would be subject for further analysis.
我已阅读以下文章: https://www.cybertec-postgresql.com/en/index-your-foreign-key/
并使用以下查询查找没有索引的所有外键:
And used the following query to find all foreign keys without an index:
SELECT c.conrelid::regclass AS "table",
/* list of key column names in order */
string_agg(a.attname, ',' ORDER BY x.n) AS columns,
pg_catalog.pg_size_pretty(
pg_catalog.pg_relation_size(c.conrelid)
) AS size,
c.conname AS constraint,
c.confrelid::regclass AS referenced_table
FROM pg_catalog.pg_constraint c
/* enumerated key column numbers per foreign key */
CROSS JOIN LATERAL
unnest(c.conkey) WITH ORDINALITY AS x(attnum, n)
/* name for each key column */
JOIN pg_catalog.pg_attribute a
ON a.attnum = x.attnum
AND a.attrelid = c.conrelid
WHERE NOT EXISTS
/* is there a matching index for the constraint? */
(SELECT 1 FROM pg_catalog.pg_index i
WHERE i.indrelid = c.conrelid
/* the first index columns must be the same as the
key columns, but order doesn't matter */
AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1]
@> c.conkey::int[])
AND c.contype = 'f'
GROUP BY c.conrelid, c.conname, c.confrelid
ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC;
这向我显示了具有复合唯一约束的表,该表仅在唯一索引中的列中一个":
This shows me for tables with composite unique constraints only "one" of the columns in the unique index:
\d topics_items;
-----------------+---------+--------------+---------------+------------------------------
topics_items_id | integer | | not null | generated always as identity
topic_id | integer | | not null |
item_id | integer | | not null |
Index:
"topics_items_pkey" PRIMARY KEY, btree (topics_items_id)
"topic_id_item_id_unique" UNIQUE CONSTRAINT, btree (topic_id, item_id)
Foreign Keys:
"topics_items_item_id_fkey" FOREIGN KEY (item_id) REFERENCES items(item_id) ON DELETE CASCADE
"topics_items_topic_id_fkey" FOREIGN KEY (topic_id) REFERENCES topics(topic_id) ON DELETE CASCADE
在这种情况下,检查查询仅将 item_id
而不是 topic_id
查找为未索引字段.
In this case the check query only finds the item_id
and not the topic_id
as an not indexed field.
可以公平地说,这只是所使用查询的一个问题,我必须分别对两个字段(topic_id和item_id)进行索引-还是涉及一些黑色巫术,而仅涉及 item_id
需要索引吗?
Is it fair to say, that this is just an issue of the query used, and I have to separately index both fields (topic_id and item_id) - or is there some black sorcery involved and only the item_id
needs an index?
推荐答案
tl; dr 您需要在 item_id
上添加索引. 11中介绍了Postgres索引编制的黑魔法".索引.
您在(topic_id,item_id)
上有一个复合索引,列的顺序很重要.Postgres可以使用它为 topic_id
上的查询建立索引,对 topic_id
和 item_id
上的查询建立索引,但不能(或效率较低) item_id
.
You have a composite index on (topic_id, item_id)
and column order is important. Postgres can use this to index queries on topic_id
, queries on both topic_id
and item_id
, but not (or less efficiently) item_id
alone.
来自 11.3.多列索引 ...
-- indexed
select *
from topics_items
where topic_id = ?
-- also indexed
select *
from topics_items
where topic_id = ?
and item_id = ?
-- probably not indexed
select *
from topics_items
where item_id = ?
这是因为像(topic_id,item_id)
这样的复合索引会先存储主题ID,然后存储同样具有该主题ID的项目ID.为了有效地在此索引中查找商品ID,Postgres必须首先使用主题ID缩小搜索范围.
This is because a composite index like (topic_id, item_id)
stores the topic ID first, then an item IDs which also have that topic ID. In order to look up an item ID efficiently in this index, Postgres must first narrow the search with a topic ID.
Postgres 认为值得这样做,则可以将其反转.如果可能的主题ID数量很少,而可能的索引ID数量很多,则会在每个主题ID中搜索索引ID.
Postgres can reverse an index if it thinks it's worth the effort. If there's a small number of possible topic IDs, and a large number of possible index IDs, it will search for the index ID in each topic ID.
例如,假设您有10个可能的主题ID和1000个可能的商品ID,并且索引为(topic_id,index_id)
.这就像有10个带有清晰标签的主题ID桶,每个桶中有1000个带有清晰标签的商品ID桶.要进入商品ID桶,它必须在每个主题ID桶内查看.要在 where item_id = 23
上使用此索引,Postgres必须在10个主题ID桶中搜索每个ID为23的桶.
For example, let's say you have 10 possible topic IDs and 1000 possible item IDs and your index (topic_id, index_id)
. This is like having 10 clearly labelled topic ID buckets each with 1000 clearly labelled item ID buckets inside. To get to the item ID buckets, it must look inside each topic ID bucket. To use this index on where item_id = 23
Postgres must search each of the 10 topic ID buckets for all the buckets with item ID 23.
但是,如果您有1000个可能的主题ID和10个可能的商品ID,则Postgres将不得不搜索1000个主题ID存储桶.很有可能它将进行全表扫描.在这种情况下,您需要反转索引并将其设置为(item_id,topic_id)
.
But if you have 1000 possible topic IDs and 10 possible item IDs, Postgres would have to search 1000 topic IDs buckets. Most likely it will do a full table scan instead. In this case you'd want to reverse your index and make it (item_id, topic_id)
.
这在很大程度上取决于是否具有良好的表统计信息,这意味着确保自动清理工作正常.
This depends heavily on having good table statistics, which means making sure autovacuum is working properly.
因此,如果一列的可变性远低于另一列,那么您就可以摆脱针对两列的单个索引.
So you can get away with a single index for two columns, if one column has far less variability than another.
Postgres也可以使用多索引,如果它认为可以查询运行速度更快.例如,如果您在 topic_id
上有一个索引,在 item_id
上有一个索引,则它可以使用这两个索引并将结果合并.例如,其中topic_id = 23或item_id = 42
可以使用topic_id索引搜索主题ID 23,并使用item_id索引搜索项目ID 42,然后将结果合并.
Postgres can also use mulitple indexes if it thinks it will make the query run faster. For example, if you had an index on topic_id
and an index on item_id
, it can use both indexes and combine the results. For example where topic_id = 23 or item_id = 42
could use the topic_id index to search for topic ID 23, and the item_id index to search for item ID 42, then combine the results.
通常比具有复合(topic_id,item_id)
索引要慢.它也可能比使用单个索引慢,因此如果Postgres决定不使用多个索引也不要感到惊讶.
This is generally slower than having a composite (topic_id, item_id)
index. It can also be slower than using a single index, so don't be surprised if Postgres decides not to use multiple indexes.
通常,对于b树索引,当您有两列时,将具有三种可能的组合.
In general, for b-tree indexes, when you have two columns you have three possible combinations.
- a + b
- a
- b
您需要两个索引.
- (a,b)-a和a + b
- (b)-b
(a,b)
涵盖了对a和a + b的搜索.(b)
涵盖了搜索 b
的情况.
(a, b)
covers both searches for a and a + b. (b)
covers searching for b
.
当您拥有三列时,就有七个可能的组合.
When you have three columns, you have seven possible combinations.
- a + b + c
- a + b
- a + c
- a
- b + c
- b
- c
但是您只需要三个索引.
But you only need three indexes.
- (a,b,c)-a,a + b,a + b + c
- (b,c)-b,b + c
- (c,a)-c,c + a
但是,您实际上可能希望避免在三列上建立索引.通常慢些.你真正想要的是这个.
However, you probably actually want to avoid having an index on three columns. It's often slower. What you actually want is this.
- (a,b)
- (b,c)
- (c,a)
从索引中读取要比从表中读取慢.您希望索引减少必须读取的行数,但是您不希望Postgres进行不必要的索引扫描.
Reading from an index is slower than reading from the table. You want your indexes to reduce the number of rows which must be read, but you don't want Postgres to have to do any more index scanning than necessary.
这篇关于在PostgreSQL中索引外键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!