问题描述
我有一个关于数据库的理论问题.为了使它更具体,我想出了一个例子.
I have a kind of theoretical question about databases. To make it more concrete I have thought up an example case.
假设我有一家销售产品的商店.我有很多不同的产品.并非每种产品都具有相同的适用特性.例如,我可以以 GB 为单位定义硬盘的大小,但不能在 CPU 上使用相同的属性,因为它不适用.我想要的是一个数据库,我可以在其中动态地向产品添加属性.我唯一能想到的是:
Suppose I have a store with products. I have lots of different products. Not every product has the same applicable properties. For instance, I could define the size of a harddisk in gigabytes, but cannot use that same property on a CPU, simply because it does not apply. What I want is a database where I can add properties to products dynamically. The only thing I can come up with is the following:
一个带有 ID、名称和描述的产品表.
One product table with an ID, a Name and a Description.
一个带有 ID、Product_ID、Property 和 Value 的属性表.
One properties table with an ID, Product_ID, Property and a Value.
这样我就有可能得到一个巨大的、我认为效率不高的属性表.这已经困扰我很长时间了.有谁知道我的问题的更好解决方案?
This way I would potentially get a gigantic, I-suppose-not-so-efficient, table of properties. This has been bugging me for a long time now. Does anyone know a better solution to my problem?
推荐答案
这其实是在走向第六范式,只是像你这样没有学术或经验背景的人不知道(a)名字(b) 规则和注意事项.这些人已经实现了众所周知的实体-属性-值或 EAV.如果操作得当,那就没问题了,并且有成千上万的医疗系统在这些表中携带诊断和剂量信息.如果不是,那就是一只狗的早餐使用和维护.
This is actually moving towards Sixth Normal Form, it is just that people like you who do not have the academic or experiential background do not know the (a) name for it and (b) the rules and the caveats. Such people have implemented what is commonly know as Entity-Attribute-Value or EAV. If it is done properly, it is fine, and there are many thousands of medical system out there carrying diagnostic and dosage info in such tables. If it is not, then it is one dog's breakfast to use and maintain.
首先确保您的
Product
符合真实且完整的 5NF.
First make sure you have
Product
in true and full 5NF.
始终使用完整的声明式引用完整性;CHECK
约束和 RULES
.
Always use full Declarative Referential Integrity; CHECK
constraints and RULES
.
永远不要把所有的东西都放在一张表中,用 VARCHAR()
作为值.始终使用正确(适用)的数据类型.这意味着您将拥有多个表,每个数据类型一个,并且不会丢失控制或完整性.
Never put all that into one table with a VARCHAR()
for Value. Always use the correct (applicable) DataTypes. That means you will have several tables, one each per DataType, and there is no loss of control or integrity.
同样,任何关联表(其中有对另一个表的多重引用[例如供应商])必须是分开的.
Likewise any Associative tables (where there is a multiple reference to another table [eg. Vendor] ) must be separate.
- 我提供了一个数据模型,其中讨论了完全控制;它包括一个简单的目录,可用于验证和导航.您需要添加每个
CHECK
约束和RULE
以确保数据和引用完整性不会丢失.这意味着,例如:- 对于存储在
ProductDecimal
中的CPUSpeed
列,CHECK
表示它在适当的值范围内 - 对于每个子
Product
表CHECK
,ProductType-ColumnNo
组合的DataType是正确的
- I am providing a Data Model which has the full control discussed; it includes a simple catalogue which can be used for verification as well as navigation. You need to add every
CHECK
Constraint andRULE
to ensure that the data and referential Integrity is not lost. That means, eg:- for the
CPUSpeed
column, which is stored inProductDecimal
,CHECK
that it is in the proper range of values - for each sub-
Product
tableCHECK
that the DataType is correct for theProductType-ColumnNo
combination
保留
Product
中的所有必填列;仅将sub-Product
表用于可选列.Keep all the mandatory columns in
Product
; use thesub-Product
tables for optional columns only.对于每个这样的(例如
Product
)表,您需要创建一个视图(虚线),它将从EAV/6NF 表中构造5NF 行.您可能有多个视图:Product_CPU
、Product_Disk
.For each such (eg
Product
) table, you need to create a View (dotted line), which will construct the 5NF rows from the EAV/6NF tables. You may have several Views:Product_CPU
,Product_Disk
.不要通过视图更新.将所有更新保存在存储过程中,并插入或更新每一列(即适用于每个特定的
Product
和sub-Product
表ProductType
) 在一起.Do not update via the View. Keep all your updates transactional, in a stored proc, and insert or update each of the columns (ie. the
Product
andsub-Product
tables which are applicable, for each particularProductType
) together.巨大的?商业数据库(不是免费软件)在处理大表或连接时没有问题.这实际上是一个非常有效的结构,并且允许非常快速的搜索,因为表实际上是面向列的(而不是面向行的).如果人口是巨大的,那就是巨大的,你自己算算吧.
Gigantic ? Commercial databases (not the freeware) have no problems with large tables or joins. This is actually a very efficient structure, and allows very fast searches, because the tables are in fact column-oriented (not row-oriented). If the population is gigantic, then it is gigantic, do your own arithmetic.
您还需要一个表,一个用于
Property
(或属性)的查找表.这是目录的一部分,基于ProductType
You need one more table, a Lookup table for
Property
(or Attribute). That is part of the catalogue, and based onProductType
更好的解决方案是采用完整、正式的第六范式.如果您只有一张或几张需要可选列的表,则不需要.
The better solution is to go for full, formal Sixth Normal Form. Not necessary if you have only one or a few tables that require optional columns.
要清楚:
第六范式是行由主键和最多一个属性组成.
这是 6NF(至少对于 Product 表集群),然后通过 DataType 再次归一化(不是在范式意义上),以减少表的数量(否则每个属性将有一个表).
This is 6NF (for at least the Product table cluster), then Normalised again (Not in the Normal Form sense) by DataType, to reduce the no of tables (otherwise you would have one table per Attribute).
这保留了完整的 Rdb 控制(FK、约束等);而常见的 EAV 类型则不需要 DRI 和控件.
This retains full Rdb control (FKs, constraints, etc); whereas the common EAV types don't bother with the DRI and control.
这也有目录的雏形.
IDEF1X 符号链接适用于不熟悉关系建模标准的人.
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
您可能对此感兴趣▶5NF 6NF 讨论◀.我会在某个时候写下来.
You might be interested in this ▶5NF 6NF Discussion◀. I will write it up at some point.
这篇关于数据库模式相关问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
- for the
- 对于存储在