performance - 为什么我会在这个 PostgreSQL 查询中得到 'Hash Join' 和 FTS？

我正在尝试优化以下场景:措辞格式:我有 2 个表， alerts 和 user_devices ；在 user_devices 中，我们跟踪耦合到 user_id 的设备是否想要获得通知，在 alerts 表中，我们跟踪用户到通知者的关系。基本上，任务是选择每个具有任何警报的 user_id，并允许向注册到它的任何设备发出通知。表“警报”，大约 90 万条记录: Table "public.alerts" Column | Type | Modifiers-------------+--------------------------+----------- id | uuid | not null user_id | uuid | target_id | uuid | target_type | text | added_on | timestamp with time zone | old_id | text |Indexes: "alerts_pkey" PRIMARY KEY, btree (id) "one_alert_per_business_per_user" UNIQUE CONSTRAINT, btree (user_id, target_id) "addedon" btree (added_on) "targetid" btree (target_id) "userid" btree (user_id) "userid_targetid" btree (user_id, target_id)Foreign-key constraints: "alerts_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)表 'user_devices'，大约 12k 条记录: Table "public.user_devices" Column | Type | Modifiers---------------------+--------------------------+----------- id | uuid | not null user_id | uuid | device_id | text | device_token | text | push_notify_enabled | boolean | device_type | integer | device_name | text | badge_count | integer | added_on | timestamp with time zone |Indexes: "user_devices_pkey" PRIMARY KEY, btree (id) "push_notification" btree (push_notify_enabled) "user_id" btree (user_id) "user_id_push_notification" btree (user_id, push_notify_enabled)Foreign-key constraints: "user_devices_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)以下查询:select COUNT(DISTINCT a.user_id)from alerts a inner join user_devices ud on a.user_id = ud.user_idWHERE ud.push_notify_enabled = true;大约需要 3 秒并生成以下计划:explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id WHERE ud.push_notify_enabled = true; QUERY PLAN------------------------------------------------------------------------------------ Aggregate (cost=49777.32..49777.33 rows=1 width=16) -> Hash Join (cost=34508.97..48239.63 rows=615074 width=16) Hash Cond: (ud.user_id = a.user_id) -> Seq Scan on user_devices ud (cost=0.00..480.75 rows=9202 width=16) Filter: push_notify_enabled -> Hash (cost=20572.32..20572.32 rows=801732 width=16) -> Seq Scan on alerts a (cost=0.00..20572.32 rows=801732 width=16)我错过了什么，有没有办法加快速度？谢谢你。== 编辑 ==根据建议，尝试在连接内移动条件，没有区别:=> explain select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled; QUERY PLAN------------------------------------------------------------------------------------ Aggregate (cost=49777.32..49777.33 rows=1 width=16) -> Hash Join (cost=34508.97..48239.63 rows=615074 width=16) Hash Cond: (ud.user_id = a.user_id) -> Seq Scan on user_devices ud (cost=0.00..480.75 rows=9202 width=16) Filter: push_notify_enabled -> Hash (cost=20572.32..20572.32 rows=801732 width=16) -> Seq Scan on alerts a (cost=0.00..20572.32 rows=801732 width=16)那么，没有办法摆脱 2 FTS 吗？如果我至少可以让它以某种方式使用“警报”表上的索引，那就太好了..== 编辑 ==添加`解释分析。=> explain ANALYZE select COUNT(DISTINCT a.user_id) from alerts a inner join user_devices ud on a.user_id = ud.user_id and ud.push_notify_enabled; QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=49777.32..49777.33 rows=1 width=16) (actual time=5254.355..5254.356 rows=1 loops=1) -> Hash Join (cost=34508.97..48239.63 rows=615074 width=16) (actual time=1824.607..2863.635 rows=614768 loops=1) Hash Cond: (ud.user_id = a.user_id) -> Seq Scan on user_devices ud (cost=0.00..480.75 rows=9202 width=16) (actual time=0.048..16.784 rows=9186 loops=1) Filter: push_notify_enabled -> Hash (cost=20572.32..20572.32 rows=801732 width=16) (actual time=1824.229..1824.229 rows=801765 loops=1) Buckets: 4096 Batches: 32 Memory Usage: 990kB -> Seq Scan on alerts a (cost=0.00..20572.32 rows=801732 width=16) (actual time=0.047..878.429 rows=801765 loops=1) Total runtime: 5255.427 ms(9 rows)=== 编辑 ===添加请求的配置。大部分是 Ubuntu PG9.1 默认值:/etc/postgresql/9.1/main# cat postgresql.conf | grep -e "work_mem" -e "effective_cache" -e "shared_buff" -e "random_page_c"shared_buffers = 24MB # min 128kB#work_mem = 1MB # min 64kB#maintenance_work_mem = 16MB # min 1MB#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers#random_page_cost = 4.0 # same scale as above#effective_cache_size = 128MB 最佳答案用部分索引替换索引:DROP INDEX user_id_push_notification ;CREATE INDEX user_id_push_notification ON user_devices (user_id) WHERE push_notify_enabled =True ;，并将 random_page_cost 设置为较低的值:设置 random_page_cost = 1.1;给我造成了一个 Index Scan using push_notification on user_devices ud (警报上的 seqscan 似乎或多或少不可避免，因为您期望 800K/900K := 88%) 行。恕我直言，索引扫描仅在行大小非常大时才有效。更新:将用户表添加到查询似乎会强制执行三重索引扫描。 (但大约在同一时间)explain ANALYZEselect COUNT(DISTINCT a.user_id)from alerts ajoin user_devices ud on a.user_id = ud.user_idjoin users us on a.user_id = us.idWHERE ud.push_notify_enabled = true;关于performance - 为什么我会在这个 PostgreSQL 查询中得到 'Hash Join' 和 FTS？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/14404209/

UD

performance - 为什么我会在这个 PostgreSQL 查询中得到 'Hash Join' 和 FTS？