了解Postgres行大小

了解Postgres行大小

本文介绍了了解Postgres行大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型(> 100M行)Postgres表格,结构为{integer,integer,integer,timestamp without time zone}。我预计一行的大小为3 * integer + 1 * timestamp = 3 * 4 + 1 * 8 = 20个字节。



实际上,行大小是 pg_relation_size(tbl)/ count(*) = 52字节。为什么?



(没有对表进行删除: pg_relation_size(tbl,'fsm')〜= 0 )

解决方案

计算行大小比这更复杂。



存储通常在8 kb 数据页中分区。每页的固定开销很小,可能的余数不够大,不能适合另一个元组,更重要的是死行或最初使用 FILLFACTOR 设置保留的百分比。



更重要的是,每行开销 (元组)。 HeapTupleHeader 23个字节和 对齐填充 。元组标题的开始以及元组数据的开始在一个典型的64位机器上以8字节为单位的 MAXALIGN 的倍数排列。一些数据类型需要与2,4或8个字节的下一个倍数进行对齐。



阅读手册中的基本知识。



您的示例



这将导致4个字节的填充, code> integer 列,因为时间戳列需要 double 对齐和需要以8个字节的下一个倍数开始。



所以,一行占用:

 code> 23  -  heaptupheader 
+ 1 - 填充或NULL位图
+ 12 - 3 *整数(无对齐填充)
+ 4 - 第3整数之后的填充
+ 8 - 时间戳
+ 0 - 没有padding,因为元组以MAXALIGN的倍数结束

最后,页眉中每个元组有一个 ItemData 指针(项目指针)(如)占用4个字节:

  + 4  - 页眉中的项目指针
------
= 52个字节

所以我们到达观察到的 52字节



计算 pg_relation_size(tbl)/ count(*)是一个悲观的估计。 pg_relation_size(tbl)包括由 fillfactor 保留的膨胀(死行)和空格,以及每个数据页的开销,每张桌子(而且我们甚至没有在,因为它不适用于此。)



您可以安装附加模块,并调用 SELECT * FROM pgstattuple('tbl_name'); 了解更多信息在表和元组大小上。



相关答案:






I got a large (>100M rows) Postgres table with structure {integer, integer, integer, timestamp without time zone}. I expected the size of a row to be 3*integer + 1*timestamp = 3*4 + 1*8 = 20 bytes.

In reality the row size is pg_relation_size(tbl) / count(*) = 52 bytes. Why?

(No deletes are done against the table: pg_relation_size(tbl, 'fsm') ~= 0)

解决方案

Calculation of row size is much more complex than that.

Storage is typically partitioned in 8 kb data pages. There is a small fixed overhead per page, possible remainders not big enough to fit another tuple, and more importantly dead rows or a percentage initially reserved with the FILLFACTOR setting.

More importantly, there is overhead per row (tuple). The HeapTupleHeader of 23 bytes and alignment padding. The start of the tuple header as well as the start of tuple data are aligned at a multiple of MAXALIGN, which is 8 bytes on a typical 64-bit machine. Some data types require alignment to the next multiple of 2, 4 or 8 bytes.

Quoting the manual on the system table pg_tpye:

Read about the basics in the manual here.

Your example

This results in 4 bytes of padding after your 3 integer columns, because the timestamp column requires double alignment and needs to start at the next multiple of 8 bytes.

So, one row occupies:

   23   -- heaptupleheader
 +  1   -- padding or NULL bitmap
 + 12   -- 3 * integer (no alignment padding here)
 +  4   -- padding after 3rd integer
 +  8   -- timestamp
 +  0   -- no padding since tuple ends at multiple of MAXALIGN

Finally, there is an ItemData pointer (item pointer) per tuple in the page header (as pointed out by @A.H. in the comment) that occupies 4 bytes:

 +  4   -- item pointer in page header
------
 = 52 bytes

So we arrive at the observed 52 bytes.

The calculation pg_relation_size(tbl) / count(*) is a pessimistic estimation. pg_relation_size(tbl) includes bloat (dead rows) and space reserved by fillfactor, as well as overhead per data page and per table. (And we didn't even mention compression for long varlena data in TOAST tables, since it doesn't apply here.)

You can install the additional module pgstattuple and call SELECT * FROM pgstattuple('tbl_name'); for more information on table and tuple size.

Related answer:

这篇关于了解Postgres行大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 04:24