问题描述
我们有一个使用PostgreSQL数据库服务器的产品,该产品已部署在数百个客户端上.这些年来,其中一些已经收集了数十亿兆字节的数据.因此,在下一版本中,我们将引入自动清理程序,该程序将在每晚批处理作业中逐步存档和删除旧记录.
We have a product using PostgreSQL database server that is deployed at a couple of hundred clients. Some of them have gathered tens of gigabytes of data over the years. So in the next version we will introduce automated cleanup procedures which will gradually archive and DELETE old records during nightly batch jobs.
如果我理解正确,那么 autovacuum 会介入并分析和重组元组,因此性能将类似于存在较少记录时的状态.
If I understand correctly, autovacuum will kick in and analyze and reorganize the tuples, so the performance will be like it was when fewer records existed.
如果我理解正确,则不会释放实际的磁盘空间,因为这只会发生在 VACUUM FULL 上,并且不会由 autovacuum 触发.
The actual disk space will not be released, if I understand correctly, since that only happens with a VACUUM FULL, and that is not triggered by autovacuum.
所以我当时正在考虑一个可以执行此操作的自动化流程.
So I was thinking about an automated process that would do this.
我在 http://wiki.postgresql.org/找到了nagios check_postgres使用的膨胀视图. wiki/Show_database_bloat .
这种观点好吗?我理解是否正确,如果 tbloat > 2,可以使用VACUUM FULL?如果 ibloat 太高,可以使用REINDEX吗?
Is this view any good? Am I correct in understanding that if the tbloat is > 2, it could use a VACUUM FULL? And if the ibloat is too high, it could use a REINDEX?
对以下要作为日常批处理作业运行的作业有何评论?
Any comments on the following job to run as daily batch job?
-
vacuumdb -Z mydatabase
#vacuum仅分析 -
select tablename from bloatview order by tbloat desc limit 1
-
vacuumdb -f -t tablename mydatabase
-
select tablename, iname from bloatview order by ibloat desc limit 1
-
reindexdb -t tablename -i iname mydatabase
vacuumdb -Z mydatabase
#vacuum with analyze-onlyselect tablename from bloatview order by tbloat desc limit 1
vacuumdb -f -t tablename mydatabase
select tablename, iname from bloatview order by ibloat desc limit 1
reindexdb -t tablename -i iname mydatabase
当然,我仍然需要将其包装在crontab中的一个不错的perl脚本中(我们正在使用ubuntu 12),或者postgresql是否有某种我可以使用的调度程序?
Off course, I still need to wrap it in a nice perl script in crontab (we're using ubuntu 12), or does postgresql have some sort of scheduler I could do this with?
或者这完全是矫kill过正,还有更简单的程序吗?
Or is this total overkill and is there a much simpler procedure?
推荐答案
您可能不需要它.最好一次执行一次-在第一次归档工作之后,您将获得磁盘空间,但是在此之后,您的日常归档工作和自动清理将防止死元组膨胀.
You probably don't need it. It is good to do this once — after first archiving job so you'll get your disk space back but after that your daily archiving job and autovacuum will prevent dead tuples bloat.
也可以代替运行cluster table_name using index_name; analyze table_name
而不是vacuum full
.这将根据索引对行进行重新排序.这样,相关的表行可以在物理上保存在磁盘附近,从而限制了磁盘查找(在传统磁盘驱动器上很重要,而在SSD上基本上不相关)以及典型查询的读取次数.
Also instead of vacuum full
it is often better to run cluster table_name using index_name; analyze table_name
. This will reorder rows according to an index. This way related table rows can be saved physically close on disk, which can limit disk seeking (important on classic disk drives, largely irrelevant on SSD) and a number of reads for your typical queries.
请记住,vacuum full
和cluster
都将使您的表在运行时不可用.
And remember that both vacuum full
and cluster
will make your tables unusable while they run.
这篇关于用于膨胀表的PostgresQL自动化VACUUM FULL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!