问题描述
关系模型的核心规则之一是元组所需的唯一性(行):
One of the core rules for the relational model is the required uniqueness for tuples (rows):
数据库中每个单独的标量值必须通过指定包含表的名称、包含列的名称和主键值在逻辑上可寻址包含行.
在 SQL 世界中,这意味着表中永远不会存在所有列值都相等的两行.如果没有有意义的方法来保证唯一性,则可以向表提供代理键.
In a SQL world, that would mean that there could never exist two rows in a table for which all the column values were equal. If there was no meaningful way to guarantee uniqueness, a surrogate key could be presented to the table.
当第一个 SQL 标准发布时,它没有定义这样的限制,从那时起就一直如此.这似乎是万恶之源.
When the first SQL standard was released, it defined no such restriction and it has been like this ever since. This seems like a root for all kind of evil.
有什么有意义的理由决定这样吗?在现实世界中,哪里可以证明没有这种限制是有用的?利大于弊吗?
Is there any meaningful reason why it was decided to be that way? In a practical world, where could an absence of such restriction prove to be useful? Does it outweigh the cons?
推荐答案
您假设数据库仅用于存储关系数据;这当然不是它们的用途,因为实际考虑总是会获胜.
You're assuming that databases are there solely for storing relational data; that's certainly not what they're used for because practical considerations will always win.
一个不需要主键的明显例子是一些描述(天气/数据库/任何东西)的状态"日志.如果您永远不会从该表中查询单个值,您可能不希望有一个主键以避免必须等待插入到键中.如果您有一个用例可以从该表中获取单个值,那么可以肯定,这将是一个糟糕的解决方案,但有些人并不需要它.如果绝对有必要,您可以随时添加代理键.
A obvious example where there's no need for a primary key would be a "state" log of some description (weather/database/whatever). If you're never going to query a single value from this table you may not want to have a primary key in order to avoid having to wait for an insert into the key. If you have a use-case to pick up a single value from this table then sure, this would be a bad solution, but some people just don't need that. You can always add a surrogate key afterwards if it becomes absolutely necessary.
另一个例子是写密集型应用程序需要告诉另一个进程做某事.这个辅助进程每 N 分钟/小时/任何时间运行一次.一次性对 N 百万条记录执行重复数据删除比检查表中每次插入的唯一性要快(相信我).
Another example would be a write intensive application needs to tell another process to do something. This secondary process runs every N minutes/hours/whatever. Doing the de-duplication on N million records as a one off is quicker than checking for uniqueness on every insert into the table (trust me).
作为关系数据库出售的内容并不仅仅用作关系数据库.它们被用作日志、键值存储、图形数据库等.它们可能没有竞争对手的所有功能,但有些有,而且拥有一个不适合您的关系模型的表通常比创建更简单一个完整的其他数据库并遭受数据传输性能损失.
What are sold as relational databases are not being used solely as relational databases. They're being used as logs, key-value stores, graph databases etc. They may not have all the functionality of the competition but some do and it's often simpler to have a single table that doesn't fit your relational model than to create a whole other database and suffer the data-transfer performance penalties.
tl;dr 人们在数学上并不完美,因此不会总是使用数学上完美的方法来做某事.委员会由人组成,有时可以意识到这一点.
tl;dr People aren't mathematically perfect and so won't always use the mathematically perfect method of doing something. Committees are made up of people and can realise this, sometimes.
这篇关于为什么 SQL 标准允许重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!