问题描述
假设您正在建模具有许多属性(2400+)的实体,远远大于给定数据库引擎的物理限制(例如〜1000 SQL Server)。除了域/候选键之外,不了解这些数据点(哪些是最热/使用最频繁)的相对重要性,您将如何实现它?
EAV。 (boo ...原生的关系工具抛出了窗口。)
B)直走。第一个表有一个主键和1000列,直到极限。下表是1000,外键入第一个。
C)均匀分布在 ceil(n / limit)
表。每个表都有偶数列,外键入第一个表。
为什么?
编辑:这更像是一个哲学/通用问题,不受任何具体限制或引擎的限制。
编辑^ 2 :正如许多人所指出的,数据可能没有正规化。通常,当时的业务约束使得深入研究不可能。
我的解决方案:进一步调查。具体来说,确定表是否真正规范化(在2400列,这看起来是不太可能的)。
如果没有,重组直到它完全规范化每个表少于1000列)。
如果它已经完全规范化,建立(尽可能)每个属性的人口的近似频率。将最常出现的属性放在实体的home表上,对不太频繁填充的属性使用2或3个附加表。 (尝试使出现频率成为确定哪些字段应该在哪些表上的标准。)
只考虑极度稀疏填充属性的EAV(最好不考虑) 。
Let's say you're modeling an entity that has many attributes (2400+), far greater than the physical limit on a given database engine (e.g. ~1000 SQL Server). Knowing nothing about the relative importance of these data points (which ones are hot/used most often) besides the domain/candidate keys, how would you implement it?
A) EAV. (boo... Native relational tools thrown out the window.)
B) Go straight across. The first table has a primary key and 1000 columns, right up to the limit. The next table is 1000, foreign keyed to the first. The last table is the remaining 400, also foreign keyed.
C) Stripe evenly across ceil( n / limit )
tables. Each table has an even number of columns, foreign keying to the first table. 800, 800, 800.
D) Something else...
And why?
Edit: This is more of a philosophical/generic question, not tied to any specific limits or engines.
Edit^2: As many have pointed out, the data was probably not normalized. Per usual, business constraints at the time made deep research an impossibility.
My solution: investigate further. Specifically, establish whether the table is truly normalised (at 2400 columns this seems highly unlikely).
If not, restructure until it is fully normalised (at which point there are likely to be fewer than 1000 columns per table).
If it is already fully normalised, establish (as far as possible) approximate frequencies of population for each attribute. Place the most commonly occurring attributes on the "home" table for the entity, use 2 or 3 additional tables for the less frequently populated attributes. (Try to make frequency of occurrence the criteria for determining which fields should go on which tables.)
Only consider EAV for extremely sparsely populated attributes (preferably, not at all).
这篇关于你将如何实现一个非常广泛的“表”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!