问题描述
我有一些数据需要放入PostgreSQL数据库中.这些数据与学校有关.因此,有很多与学校有关的属性,主要是小整数,浮点数或小文本.并且所有数据每年都在变化.因此,我正在创建一个名为 YearlyData
的实体,并将属性放在此处.但事实是,属性的数量大约为50-60.现在这些无法归一化,因为它们是学校本身的简单属性.因此,从道德上讲,我无法将它们分成表.但是我不确定这是否会损害我的表现.
I have some data that I need to put in a PostgreSQL database. These data are related to schools. So there are a LOT of attributes relating to the school, mostly small integers, floats or small texts. And all the data changes yearly. So I'm creating an entity called YearlyData
and putting the attributes there. But the thing is, the number of attributes are around 50-60. Now these cannot be normalized because they are plain attributes of the school itself. So I ethically cannot separate them into tables. But I'm not sure if this would harm my performance.
我可以尝试对这些数据进行分类,然后将它们放在单独的表中,并从 YearlyData
表中指向它们.但是,我猜测,然后尝试搜索具有20-30个以上参数的学校将导致疯狂的加入量.我也不确定这是否会损害我的表现.
I can try to categorize those data and put them in separate tables and point to them from the YearlyData
table. But then trying to search schools with 20-30+ parameters would cause insane amount of joins, I'm guessing. I'm also not sure if that'd harm my performance.
有专家建议吗?
推荐答案
这里有几件事情要考虑:
There are a few things to consider here:
- 属性列表是否随时间发生了显着变化
- 属性列表是否需要自定义用户定义的属性
- 不同学校是否有不同的属性(即,许多属性仅适用于一所或几所学校)?
- Does the list of attributes change significantly over time
- Does the list of attributes require custom user-defined attributes
- Are there different attributes for different schools (i.e. many attributes only apply to one or a few schools)?
如果其中任何一个是正确的,您可能会考虑使用属性存储方法像EAV,hstore,json字段,xml字段等等.
If any of these are true, you might think about a properties store approach like EAV, hstore, json fields, xml fields, etc.
如果不是这样-如果您有一个相当静态的属性列表,其中的大多数对大多数行都有意义-那么将它们作为60个单独的列并不是一个真正的问题.为通常搜索的属性集(包括部分索引和组合索引等)添加索引会更容易,并且搜索-特别是针对许多不同属性的搜索-会更快. .
If not - if you have a fairly static list of properties where most of them make sense for most of the rows - then there's not really a problem with having them as 60 individual columns. It'll be easier to add indexes for commonly searched for sets of attributes, including partial and composite indexes, etc, and searches - particularly those for many different attributes - will be much faster.
另请参阅:数据库设计-我应该对所有数据以JSON/XML形式使用30列还是1列?
还有一个折衷方案供您选择:一个用于查找您最重要细节的主表,以及用于属性逻辑分组的边表.说:
There's also a compromise option available to you: A main table for the most important details you look up a lot, plus side-tables for logical groupings of attributes. Say:
yearly_summary (
yearly_summary_id serial primary key,
school_id integer,
total_students integer,
...
)
加
yearly_student_stats(
yearly_summary_id integer primary key references yearly_summary(yearly_summy_id) on delete cascade,
...
)
等整数主键
也是外键
,表示您与另一个表具有强制的1:1(可选)关系.如果您可以将一些逻辑上的属性分组归类到边表中,则此方法很有用.
etc. The integer primary key
that's also a foreign key
means you have an enforced 1:1 (optional) relationship to the other table. This approach can be useful if you have a few logical groupings of attributes that you can cluster into side-tables.
如果更多的想法没有揭示出要做可以正常化的事情,我也会感到惊讶.您是否有 year7_blah
, year8_blah
, year9_blah
等列?如果是这样:非常适合标准化.
I'd also be surprised if a little more thought didn't reveal things that do make sense to normalize. Do you have year7_blah
, year8_blah
, year9_blah
etc columns? If so: Great candidate for normalization.
这篇关于表格过多的缺点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!