本文介绍了PostgreSQL 索引不用于 IP 范围查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 PostgreSQL 9.2 并且有一个 IP 范围表.这是 SQL:

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips (
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

我在 begin_ip_numend_ip_num 上都添加了普通的 B 树索引:

I've added plain B-tree indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

正在使用的查询是:

SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

问题是我的BETWEEN 查询仅使用begin_ip_num 上的索引.使用索引后,它使用end_ip_num 过滤结果.这是 EXPLAIN ANALYZE 结果:

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

我已经尝试了各种索引组合,包括在 begin_ip_numend_ip_num 上添加复合索引.

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

推荐答案

尝试 多列索引,但第二列的顺序相反:

Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

排序与单列索引几乎无关,因为它几乎可以同样快地向后扫描.但它对于多列索引很重要.

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

使用我提出的索引,Postgres 可以扫描第一列并找到地址,索引的其余部分满足第一个条件.然后,对于第一列的每个值,它可以返回满足第二个条件的所有行,直到第一个失败.然后跳转到第一列的下一个值,依此类推
仍然不是很有效,Postgres 可能会更快,只需扫描第一个索引列并过滤第二个.很大程度上取决于您的数据分布.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

无论哪种方式,CLUSTER使用上面的多列索引可以提高性能:

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

通过这种方式,满足您的第一个条件的候选人会被打包到相同或相邻的数据页上.如果第一列的每个值有很多行,则可以帮助提高性能.否则几乎没有效果.
(也有用于此目的的非阻塞外部工具:pg_repackpg_squeeze.)

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.
(There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)

此外,autovacuum 是否正确运行和配置,或者您是否有在桌子上运行 ANALYZE ?您需要 Postgres 的当前统计数据来选择合适的查询计划.

Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

真正有用的是GiST索引 用于 int8range,自 PostgreSQL 9.2 起可用.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2.

进一步阅读:

如果您的 IP 范围可以用 内置网络类型inetcidr,考虑替换你的两个bigint 列.或者,更好的是,查看 Andrew Gierth 的 附加模块 ip4r(不在标准分布中.索引策略相应地改变.

If your IP ranges can be covered with one of the built-in network types inet or cidr, consider to replace your two bigint columns. Or, better yet, look to the additional module ip4r by Andrew Gierth (not in the standard distribution. The indexing strategy changes accordingly.

除此之外,您可以使用带有部分索引的复杂机制在 dba.SE 上查看此相关答案.先进的东西,但它提供了出色的性能:

Barring that, you can check out this related answer on dba.SE with using a sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance:

这篇关于PostgreSQL 索引不用于 IP 范围查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-03 22:44