问题描述
我们希望扩展我们的数据库以创建多语言支持,但我们不确定如何做到这一点。
我们的数据库如下所示:
ID - 名称 - 说明 - (很多不相关的列)
选项1是向表中添加一个xml列,在这一列中,我们可以存储我们需要的信息,如下所示:
< translation>
< language value ='en'>
< Name value =''>
<描述值=''>
< / language>
< language value ='fr'>
< Name value =''>
<描述值=''>
< / language>
< / translation>
诀窍和优势是当我删除行时,我也删除翻译。 / p>
选项2是添加一个额外的表,创建一个表来存储信息很容易,但是在获取信息时需要内联,更多的努力删除行当原始行被删除时。
在这种情况下,首选方案是什么?或者还有其他好的解决方案吗?
我建议使用关系方法,即单独的翻译表)。考虑这样做:
此模型有一些不错的属性:
- 对于每个多语言表,创建一个单独的转换表。这样,您可以使用适合该特定表格的字段,并且转换不能与错误的表格错误连接。
- 存在LANGUAGE表和关联的FOREIGN KEY,确保与XML不同,不存在不存在的语言的翻译。
- 参考动作将确保在删除语言时不会遗漏孤立的翻译与XML不同。
- 虽然XML在更简单的情况下可能会更快,但我怀疑当语言数量增加时,JOIN更具可扩展性。无论如何, 衡量的差异,并决定自己是否足够重要。
- 单独的字段(如NAME和DESCRIPTION)可能更容易进行索引。使用XML,您可能需要一个特别支持XML的DBMS,或者可能需要某种全文索引。
- 诸如NAME和DESCRIPTION之类的字段可能只是常规的VARCHAR。 OTOH将它们放在一起可能会产生对于常规VARCHAR的XML太大,迫使您使用CLOB / BLOB,这可能会导致自己的性能复杂化。
- 如果您的DBMS支持集群下面),整个翻译表可以存储在一个B-Tree中。 XML具有大量冗余数据(开放和关闭标签),可能使它比B-Tree更大更少的缓存友好(即使我们计算所有相关的开销)。
您会注意到上述模型使用和生成的PK:{LANGUAGE_ID,TABLEx_ID}可以用于(因此属于相同语言的翻译物理上靠近在一起存储在数据库中)。只要你有少数主要(或热)语言,这应该是可以的 - 缓存是在数据库页面级完成的,所以避免将热和冷数据混合在一起页面避免缓存冷数据(并使缓存较小)。
OTOH,如果您经常需要查询多种语言,请考虑翻转聚类键顺序到:{TABLEx_ID,LANGUAGE_ID},所以同一行的所有翻译都在物理上靠近在一起存储在数据库中。一旦您检索到一个翻译,同一行的其他翻译可能已被缓存。或者,如果要在单个查询中提取多个翻译,可以使用较少的I / O来执行。
我们可以仅以所需语言进行翻译。使用XML,您必须加载(并解析)整个XML,然后才决定仅使用与其所需语言相关的其中一小部分。每当您添加新的语言(和相关的翻译到XML)时,即使您很少使用新语言,也会减慢现有行的处理速度。
We want to extend our database to create Multilanguage support but we are unsure how to do this.Our database looks like this:
ID – Name – Description – (a lot of irrelevant columns)
Option 1 is to add an xml column to the table, in this column we can store the information we need like this:
<translation>
<language value=’en’>
<Name value=’’>
<Description value=’’>
</language>
<language value=’fr’>
<Name value=’’>
<Description value=’’>
</language>
</translation>
Does the trick and the advantage is that when I delete the row, I also delete the translations.
Option 2 is to add an extra table, it’s easy to create a table to store the information in, but it requires inner joins when getting the information and more effort to delete rows when the original row is deleted.
What is the preferred option in this case? Or are there other good solutions for this?
I'd recommend the "relational" approach, i.e. separate translation table(s). Consider doing it like this:
This model has some nice properties:
- For each multi-lingual table, create a separate translation table. This way, you can use the fields appropriate for that particular table, and the translation cannot be "misconnected" to the wrong table.
- The existence of the LANGUAGE table and the associated FOREIGN KEYs, ensures that a translation cannot exist for non-existent language, unlike the XML.
- ON DELETE CASCADE referential action will ensure no "orphaned" translation can be left behind when a language is removed, unlike the XML.
- While XML may be faster in simpler cases, I suspect JOIN is more scalable when the number of languages grows. In any case, measure the difference and decide for yourself if it's significant enough.
- Separate fields such as NAME and DESCRIPTION may be easier to index. With XML, you'd probably need a DBMS with special support for XML, or possibly some sort of full-text index.
- Fields such as NAME and DESCRIPTION will likely be just regular VARCHARs. OTOH, putting them together may produce XML too large for a regular VARCHAR, forcing you to use a CLOB/BLOB, which may have its own performance complications.
- If your DBMS supports clustering (see below), the whole translation table can be stored in a single B-Tree. XML has a lot of redundant data (opening and closing tags), likely making it larger and less cache-friendly than the B-Tree (even when we count-in all the associated overheads).
You'll notice that the model above uses identifying relationships and the resulting PK: {LANGUAGE_ID, TABLEx_ID} can be used for clustering (so the translations that belong to the same language are stored physically close together in the database). As long you have few predominant (or "hot") languages, this should be OK - the caching is done at the database page level, so avoiding mixing "hot" and "cold" data in the same page avoids caching "cold" data (and making the cache "smaller").
OTOH, if you routinely need to query for many languages, consider flipping the clustering key order to: {TABLEx_ID, LANGUAGE_ID}, so all the translations of the same row are stored physically close together in the database. Once you retrieve one translation, other translations of the same row are probably already cached. Or, if you want to extract multiple translations in the single query, you could do it with less I/O.
We can JOIN just to the translation in the desired language. With XML, you must load (and parse) the whole XML, before deciding to use only a small portion of it that pertains to the desired language. Whenever you add a new languages (and the associated translations to the XML), it slows down the processing of existing rows even if you rarely use the new language.
这篇关于添加信息,xml列或新表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!