问题描述
最近,我一直在重新思考我几个月前做的数据库设计。主要原因是昨天晚上我读了vBulletin的数据库模式,看到他们使用了许多,多个表。
我使用的当前想法我的模式,例如我的日志表,是通过用整数区分日志类型来将所有内容保存在一个表中:
id,type,type_id,action,message
1,1,305,2,'用户说明Ban'
2,2,1045,1,'删除文章'
其中 type 1 = user,type 2 = article
code> type_id =用户的ID,article或w / e 和动作2 =禁止,动作1 =删除
/ p>
我应该将设计更改为两个表 logBans
, logSomething
等等?
这里的问题是子类型。
- 将每个记录类型放入一个完全独立的表中;
- 将记录放在父表中,然后在子类型表中记录;和
- 将所有记录放在一个表中,并为可选数据(即不适用于该类型的数据)设置可空列。
每个策略都有其优点。
例如,不同亚型之间没有差异。在你的case,不同的日志记录有额外的列,如果他们是一个特定的类型?如果他们没有,或者很少的情况下,他们把它们全部放在一个表是完全有道理。
(2)是常用的Party表。这是CRM中的一个常见模型,涉及一个父类对象,它具有Person和Organization的子类型(Organization也可以有公司,Association等子类型)。个人和组织具有不同的属性(例如称呼为人的名字,出生日期等),因此将其拆分而不使用可空列是有意义的。
(2)可能更具空间效率(尽管现代DBMS中的NULL列的开销非常低)。更大的问题是,(2)可能更混乱的开发人员。你会得到一个情况,有人需要在某个地方存储一个额外的字段,并将它在一个空列的类型只是因为它更容易做,而不是批准DBA添加一个列(不,我不开玩笑)。
(1)可能是我经验中最不常用的3个方案。
,必须考虑可扩展性,并且可能是(1)的最佳情况。在某些点,JOIN不能有效地扩展,你需要使用某种分区方案来减少你的表大小。 (1)是这样做的一种方法(但是一个粗略的方法)。
我不会担心太多。你通常需要在成为问题之前获取数亿或数十亿条记录(除非你的记录真的很大,在这种情况下它会更快)。
Lately I've been rethinking a database design I made a couple of months ago. The main reason is that last night I read the databse schema of vBulletin and saw that they use many, MANY, tables.
The current "idea" I'm using for my schema, for instance my log table, is to keep everything in one table by differencing the type of Log with an integer:
id, type, type_id, action, message
1 , 1, 305, 2, 'Explanation for user Ban'
2, 2, 1045, 1, 'Reason for deletion of Article'
Where type 1 = user, type 2 = article
, type_id = the ID of the user, article or w/e
and action 2 = ban, action 1 = deletion
.
Should I change the design to two tables logBans
, logSomething
and so on? or is it better to keep the method I'm currently using?
The issue here is subtyping. There are three basic approaches to dealing with subtypes.
- Put each record type into a completely separate table;
- Put a record in a parent table and then a record in a subtype table; and
- Put all the records in one table, having nullable columns for the "optional" data (ie things that don't apply to that type).
Each strategy has its merits.
For example, (3) is particularly applicable if there is little to no difference between different subtypes. In your case, do different log records have extra columns if they're of a particular type? If they don't or there are few cases when they do putting them all in one table makes perfect sense.
(2) is common used for a Party table. This is a common model in CRMs that involves a parent Party object which has subtypes for Person and Organization (Organization may also have subtypes like Company, Association, etc). Person and Organization have different properties (eg salutation, given names, date of birth, etc for Person) so it makes sense to split this up rather than using nullable columns.
(2) is potentially more space efficient (although the overhead of NULL columns in modern DBMSs is very low). The bigger issue is that (2) might be more confusing to developers. You will get a situation where someone needs to store an extra field somewhere and will whack it in a column that's empty for that type simply because it's easier doing that than getting approval for the DBAs to add a column (no, I'm not kidding).
(1) is probably the least frequently used scheme of the 3 in my experience.
Lastly, scalability has to be considered and is probably the best case for (1). At a certain points JOINs don't scale effectively and you'll need to use some kind of partitioning scheme to cut down your table sizes. (1) is one method of doing that (but a crude method).
I wouldn't worry too much about that though. You'll typically need to get to hundreds of millions or billions of records before that becomes an issue (unless your records are really really large, in which case it'll happen sooner).
这篇关于数据库表,越多越好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!