Postgresql使用left join 优化 not in

场景：
查看在t1表中的数据，而这些数据不能出现在t2表。初学者容易想到 not in。这个问题可以使用left join 解决，如果两个表的结构一致，也可以采用差集来解决

使用例子

建立测试数据
新建两个表t1, t2, 每个表有100万的数据。有999990是相同的。现在要找到在t1的数据，而不再t2的数据。

create table t1(id int, info text);
create table t2(id int, info text);

insert into t1(id, info) select i, md5(i::text) from generate_series(1, 1000000) t(i);
insert into t2(id, info) select i, md5(i::text) from generate_series(11, 1000000 + 10) t(i);

create index on t1(id);
create index on t2(id);

postgres=# select count(*) from t1;
  count
---------
 1000000
(1 row)

postgres=# select count(*) from t2;
  count
---------
 1000000
(1 row)

使用not in 效果

postgres=# explain  select * from t1 where id not in (select id from t2);
                               QUERY PLAN
--------------------------------------------------------------------------------
 Gather  (cost=1000.00..6196106209.00 rows=500000 width=37)
   Workers Planned: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..6196055209.00 rows=208333 width=37)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..27241.00 rows=1000000 width=4)
                 ->  Seq Scan on t2  (cost=0.00..18334.00 rows=1000000 width=4)
(7 rows)

使用left join 的效果

postgres=# explain  select t1.* from t1 left join t2 on t1.id = t2.id where t2.* is  null;
                                  QUERY PLAN
---------------------------------------------------------------------------------------
 Gather  (cost=23592.00..51286.28 rows=5000 width=37)
   Workers Planned: 2
   ->  Parallel Hash Left Join  (cost=22592.00..49786.28 rows=2083 width=37)
         Hash Cond: (t1.id = t2.id)
         Filter: (t2.* IS NULL)
         ->  Parallel Seq Scan on t1  (cost=0.00..12500.67 rows=416667 width=37)
         ->  Parallel Hash  (cost=12500.67..12500.67 rows=416667 width=65)
               ->  Parallel Seq Scan on t2  (cost=0.00..12500.67 rows=416667 width=65)
(8 rows)

对比结果
6196106209 / 51286 = 120814
基本上相差12万倍

轨迹16

Postgresql使用left join 优化 not in

Postgresql使用left join 优化 not in

使用例子