在 PostgreSQL 中,表膨胀是一个常见的问题,它会导致数据库性能下降,甚至在极端情况下会耗尽磁盘空间。了解表膨胀的原因及其解决方案,对于维护数据库性能和稳定性至关重要。
接下来,给大家介绍一下,常用的三种表膨胀处理的PostgreSQL插件工具
1.pg_repack
pg_repack 是 PostgreSQL 数据库生态的一款第三方插件,通过pg_repack插件对表空间进行重新“包装”,回收碎片空间,有效解决因对表大量更新、删除等操作引起的空间膨胀问题。pg_repack获取排它锁的时间较短,多数时间不阻塞读写,相比CLUSTER或VACUUM FULL操作更加轻量化。
pg_repack的用法
create table pgtest(id int primary key, col1 text);
insert into pgtest select n, md5(random()::text) || n from generate_series(1, 500000) as n;
postgres=# select pg_size_pretty(pg_total_relation_size('pgtest'));
pg_size_pretty
----------------
47 MB
(1 row)
postgres=# delete from pgtest where id between 200001 and 400000;
DELETE 200000
postgres=# select pg_size_pretty(pg_total_relation_size('pgtest'));
pg_size_pretty
----------------
47 MB
(1 row)
删完200000条,仍然是47MB,使用pg_repack清理
pg_repack -h localhost -p 8522 -U <admin user> -W -d Hhjjkfldasjk -t public.test -j 2 -k -D
[postgres@pghost01 ~]$pg_repack -h localhost -p 5432 -U postgres -W -d postgres -t public.pgtest -j 2 -k -D
Password:
NOTICE: Setting up workers.conns
INFO: repacking table "public.pgtest"
postgres=# select pg_size_pretty(pg_total_relation_size('pgtest'));
pg_size_pretty
----------------
28 MB
(1 row)
2.pgcompacttable
pgcompacttable利用了PostgreSQL的一个有趣特性:在执行INSERT和UPDATE操作时,会将所有新版本的行移到表最开始的可用空间。此为pgcompacttable工具的关键,因为如果从末端反向开始更新所有行,最终所有可用空间被这些行填充,并将表尾部的空间全部释放以便让定期vacuum进行truncate。这样一来,pgcompacttable通过批量更新和vacuum强制移动,最终整个表被重新整理,达到压缩的效果。此工具对磁盘空间要求低,且性能影响可控。
pgcompacttable使用方法:
pgcompacttable可以对database级别、schema级别、table级别进行压缩
cd /home/postgres/pgcompacttable/bin
./pgcompacttable -h localhost -U postgres -d old
[Mon Sep 30 07:34:35 2024] (old) Connecting to database
[Mon Sep 30 07:34:35 2024] (old) Postgres backend pid: 6068
[Mon Sep 30 07:34:35 2024] (old) Handling tables. Attempt 1
[Mon Sep 30 07:34:35 2024] (old:public.pgtest01) Statistics: 4672 pages (6055 pages including toasts and indexes), it is expected that ~40.090% (1872 pages) can be compacted with the estimated space saving being 14.632MB.
[Mon Sep 30 07:34:44 2024] (old:public.pgtest01) Reindex: public.pgtest01_pkey, initial size 1374 pages(10.734MB), has been reduced by 39% (4.289MB), duration 0 seconds.
[Mon Sep 30 07:34:44 2024] (old:public.pgtest01) Processing results: 2803 pages left (3633 pages including toasts and indexes), size reduced by 14.602MB (18.898MB including toasts and indexes) in total.
[Mon Sep 30 07:34:44 2024] (old) Processing complete.
[Mon Sep 30 07:34:44 2024] (old) Processing results: size reduced by 14.602MB (18.898MB including toasts and indexes) in total.
[Mon Sep 30 07:34:44 2024] (old) Disconnecting from database
[Mon Sep 30 07:34:44 2024] Processing complete: 1 retries to process has been done
[Mon Sep 30 07:34:44 2024] Processing results: size reduced by 14.602MB (18.898MB including toasts and indexes) in total, 14.602MB (18.898MB) old.
old=# SELECT * FROM pgstattuple('public.pgtest01');
-[ RECORD 1 ]------+---------
table_len | 22962176
tuple_count | 300000
tuple_len | 19988895
tuple_percent | 87.05
dead_tuple_count | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 91684
free_percent | 0.4
3.pg_squeeze
pg_squeeze 是一个针对 PostgreSQL 的扩展插件,其主要功能是高效清理表中的无用空间,并能按特定索引对数据进行排序,类似于并行的 CLUSTER 命令。它旨在替代 pg_repack 扩展,提供更简单和高效的数据库维护方案。
下载路径如下:
https://github.com/cybertec-postgresql/pg_squeeze
设置PG_CONFIG环境变量,安装时,我们必须确保路径中的pg_config 版本正确
wget https://github.com/cybertec-postgresql/pg_squeeze/archive/refs/tags/REL1_7_0.zip
--解压
unzip REL1_7_0.zip
编译安装
cd pg_squeeze-REL1_7_0/
make install
修改postgresql.conf 参数文件并重启生效
#将pg_squeeze添加到现有库中。
shared_preload_libraries = 'pg_squeeze'
--安装插件
CREATE EXTENSION pg_squeeze;
postgres=# CREATE EXTENSION pg_squeeze;
手动收缩表,而无需注册,跳过任何时间和膨胀检查。
SELECT
squeeze.squeeze_table('public', 'test', null, null, null);
4.总结
总结来说,pg_repack和pg_squeeze都适用于处理表膨胀问题,但pg_repack通过触发器实现,可能对DML操作有一定影响,而pg_squeeze则通过逻辑复制槽实现,影响较小。pgcompacttable则侧重于在原地压缩表数据,但不适用于频繁更新的表。