问题描述
我面临着将主键从 int 身份更改为 Guid 的困境.我会直截了当地提出我的问题.这是一个典型的零售管理应用程序,具有 POS 和后台功能.有大约100张桌子.数据库与其他数据库同步并接收/发送新数据.
大多数表不会频繁插入、更新或执行选择语句.但是,有些确实有频繁的插入和选择,例如.产品和订单表.
有些表中最多有 4 个外键.如果我将主键从int"更改为Guid",那么在从具有许多外键的表中插入或查询数据时是否会出现性能问题.我知道有人说过索引会碎片化,16 字节是个问题.
空间在我的情况下不会成为问题,显然也可以使用NEWSEQUENTIALID()"函数来处理索引碎片.有人可以告诉我,从那里的经验来看,Guid 在有很多外键的表中是否会出现问题.
我将非常感谢您对此的想法...
GUID 似乎是您的主键的自然选择 - 如果您真的必须,您可能会争论将其用于表的 PRIMARY KEY.我强烈建议不要做的是使用 GUID 列作为群集键,SQL Server 默认情况下会这样做,除非您明确告诉它不要这样做.>
你真的需要把两个问题分开:
1) 主键 是一种逻辑结构 - 唯一且可靠地标识表中每一行的候选键之一.这实际上可以是任何东西 - 一个 INT、一个 GUID、一个字符串 - 选择最适合您的场景的内容.
2) 聚簇键(在表上定义聚簇索引"的一列或多列)——这是一个物理存储相关的东西,这里, 小型、稳定、不断增加的数据类型是您的最佳选择 - INT 或 BIGINT 作为您的默认选项.
默认情况下,SQL Server 表上的主键也用作集群键 - 但不必如此!当将以前的基于 GUID 的主键/集群键分解为两个单独的键时,我个人看到了巨大的性能提升 - GUID 上的主(逻辑)键和单独的 INT IDENTITY 上的集群(排序)键(1,1) 列.
作为 KimberlyTripp - 索引女王 - 和其他人已经说过很多次了 - 作为集群键的 GUID 不是最佳的,因为由于它的随机性,它会导致大量的页面和索引碎片,并且通常很糟糕性能.
是的,我知道 - 在 SQL Server 2005 及更高版本中有 newsequentialid()
- 但即使这样也不是真正和完全连续的,因此也遇到与 GUID 相同的问题 - 只是一点点不那么显眼.
然后还有另一个问题需要考虑:表上的聚簇键也将添加到表上每个非聚簇索引的每个条目中 - 因此您真的想确保它尽可能小.通常,对于绝大多数表来说,具有 2+ 十亿行的 INT 应该足够了 - 与作为集群键的 GUID 相比,您可以在磁盘和服务器内存上节省数百兆字节的存储空间.
快速计算 - 使用 INT 与 GUID 作为主键和聚类键:
- 具有 1'000'000 行的基表(3.8 MB 与 15.26 MB)
- 6 个非聚集索引(22.89 MB 与 91.55 MB)
总计:25 MB 对 106 MB - 这只是在一张桌子上!
一些值得深思的食物 - 金伯利·特里普 (Kimberly Tripp) 的优秀作品 - 阅读、再阅读、消化它!这是 SQL Server 索引的福音,真的.
因此,如果您确实必须将主键更改为 GUID - 尝试确保主键不是集群键,并且您在使用的表上仍有一个 INT IDENTITY 字段作为聚类键.否则,您的表现肯定会受到重创.
I am faced with the dilemma of changing my primary keys from int identities to Guid. I'll put my problem straight up. It's a typical Retail management app, with POS and back office functionality. Has about 100 tables. The database synchronizes with other databases and receives/ sends new data.
Most tables don't have frequent inserts, updates or select statements executing on them. However, some do have frequent inserts and selects on them, eg. products and orders tables.
Some tables have upto 4 foreign keys in them. If i changed my primary keys from 'int' to 'Guid', would there be a performance issue when inserting or querying data from tables that have many foreign keys. I know people have said that indexes will be fragmented and 16 bytes is an issue.
Space wouldn't be an issue in my case and apparently index fragmentation can also be taken care of using 'NEWSEQUENTIALID()' function. Can someone tell me, from there experience, if Guid will be problematic in tables with many foreign keys.
I'll be much appreciative of your thoughts on it...
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.
As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know - there's newsequentialid()
in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.
Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
Quick calculation - using INT vs. GUID as Primary and Clustering Key:
- Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
- 6 nonclustered indexes (22.89 MB vs. 91.55 MB)
TOTAL: 25 MB vs. 106 MB - and that's just on a single table!
Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.
- GUIDs as PRIMARY KEY and/or clustered key
- The clustered index debate continues
- Ever-increasing clustering key - the Clustered Index Debate..........again!
So if you really must change your primary keys to GUIDs - try to make sure the primary key isn't the clustering key, and you still have an INT IDENTITY field on the table that is used as the clustering key. Otherwise, your performance is sure to tank and take a severe hit .
这篇关于Guid 主键/外键困境 SQL Server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!