问题描述
我有一个有 7 列的表,其中 5 列为空.我将在 int
、text
、date
、boolean
和 money 数据类型.该表将包含具有许多空值的数百万行.恐怕空值会占用空间.
I have a table with 7 columns and 5 of them will be null. I will have a null columns on
int
, text
, date
, boolean
, and money
data types. This table will contain millions of rows with many many nulls. I am afraid that the null values will occupy space.
另外,你知道 Postgres 是否索引空值吗?我想防止它索引空值.
Also, do you know if Postgres indexes null values? I would like to prevent it from indexing nulls.
推荐答案
基本上,
NULL
值在 NULL 位图中占据 1 位.但事情没那么简单.
Basically,
NULL
values occupy 1 bit in the NULL bitmap. But it's not that simple.
空位图(每行)仅在该行中至少有一列包含
NULL
值时才分配.这可能会在具有 9 列或更多列的表中导致看似矛盾的效果:将第一个 NULL
值分配给列可能比向其中写入值占用更多的磁盘空间.相反,从行中删除最后一个 NULL 值也会删除 NULL 位图.
The null bitmap (per row) is only allocated if at least one column in that row holds a
NULL
value. This can lead to a seemingly paradoxic effect in tables with 9 or more columns: assigning the first NULL
value to a column can take up more space on disk than writing a value to it. Conversely, removing the last NULL value from the row also removes the NULL bitmap.
物理上,初始空位图在
HeapTupleHeader
(23 字节)和实际列数据或行 OID
(如果你应该仍然使用它) - 它总是以MAXALIGN
的倍数开始(通常是8个字节).这留下了 1 字节 由初始空位图使用的填充.
Physically, the initial null bitmap occupies 1 byte between the
HeapTupleHeader
(23 bytes) and actual column data or the row OID
(if you should still be using that) - which always start at a multiple of MAXALIGN
(typically 8 bytes). This leaves 1 byte of padding that is utilized by the initial null bitmap.
实际上,NULL 存储对于 8 列或更少的表是完全免费的(包括已删除但尚未清除的列).
之后,另一个 MAXALIGN
字节(通常为 8)被分配给下一个 MAXALIGN * 8
列(通常为 64).等
In effect, NULL storage is absolutely free for tables of 8 columns or less (including dropped, but not yet purged columns).
After that, another MAXALIGN
bytes (typically 8) are allocated for the next MAXALIGN * 8
columns (typically 64). Etc.
更多详情在手册中和这些相关问题下:
More details in the manual and under these related questions:
使用 postgresql DB 存储 NULL 值需要多少磁盘空间?
在 PostgreSQL 中不使用 NULL 是否仍然在标头中使用 NULL 位图?
我可以在 Heroku 上的 5 MB PostgreSQL 中存储多少条记录?
了解数据类型的对齐填充后,您可以进一步优化存储:
Once you understand alignment padding of data types, you can further optimize storage:
但是可以节省大量空间的情况很少见.通常不值得付出努力.
But the cases are rare where you can save substantial amounts of space. Normally it's not worth the effort.
@Daniel 已经介绍了对索引大小的影响.
@Daniel already covers effects on index size.
注意删除的列(虽然现在不可见)会保留在系统目录中,直到重新创建表.这些僵尸可以强制分配(放大的)NULL 位图.见:
Note that dropped columns (though now invisible) are kept in the system catalogs until the table is recreated. Those zombis can force the allocation of an (enlarged) NULL bitmap. See:
这篇关于可空列在 PostgreSQL 中占用额外空间吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!