问题描述
varchar列如何由数据库引擎内部处理?对于定义为char(100)的列,DBMS在磁盘上分配100个连续字节。然而,对于定义为varchar(100)的列,大概不是这种情况,因为varchar的整个点不会分配比存储列中存储的实际数据值所需的更多空间。因此,当用户将包含空varchar(100)列的数据库行更新为包含80个字符的值时,该80个字符的空间从哪里分配?看来,varchar列必须导致实际数据库行的大量碎片,至少在列值最初插入为空或NULL,然后用实际值更新的情况下。这种碎片化会导致数据库查询的性能下降,而不是使用char类型值,其中存储在行中的列的空间是连续分配的?显然,使用varchar导致的磁盘空间比使用char少,但是在优化查询性能时,尤其是对于在初始插入后频繁更新其值的列,性能会受到影响。
在数据库引擎中使用的数据结构远比你给它的信用更复杂!是的,存在碎片问题和更新具有大值的varchar可能导致性能损失的问题,然而,如果没有更全面地了解所涉及的数据结构,则难以解释/理解这些问题的影响。
对于MS Sql服务器,您可能需要先了解页面 - 存储的基本单位(请参阅)
对于修复和可变存储类型对性能的性能影响,需要考虑以下几点:
- 使用可变长度列可以提高性能,因为它允许更多的行适合单个页面,意味着更少的读取
- 使用可变长度列需要特殊的偏移值,维护这些值需要一些开销,
- 另一个潜在成本是当包含该列的网页几乎已满时,增加列大小的成本
正如你所看到的,情况是相当复杂的 - 但一般来说,你可以相信数据库引擎在处理可变数据类型时相当不错,它们应该是数据
此时,我还将推荐一本出色的书Microsoft Sql Server 2008 Internals,以便更深入地了解这样复杂的事情真的得到了什么!
How are varchar columns handled internally by a database engine? For a column defined as char(100), the DBMS allocates 100 contiguous bytes on the disk. However for a column defined as varchar(100), that presumably isn't the case, since the whole point of varchar is to not allocate any more space than required to store the actual data value stored in the column. So, when a user updates a database row containing an empty varchar(100) column to a value consisting of 80 characters for instance, where does the space for that 80 characters get allocated from? It seems that varchar columns must result in a fair amount of fragmentation of the actual database rows, at least in scenarios where column values are initially inserted as blank or NULL, and then updated later with actual values. Does this fragmentation result in degraded performance on database queries, as opposed to using char type values, where the space for the columns stored in the rows is allocated contiguously? Obviously using varchar results in less disk space than using char, but is there a performance hit when optimizing for query performance, especially for columns whose values are frequently updated after the initial insert?
The data structures used inside a database engine is far more complex than you are giving it credit for! Yes, there are issues of fragmentation and issues where updating a varchar with a large value can cause a performance hit, however its difficult to explain /understand what the implications of those issues are without a fuller understanding of the datastructures involved.
For MS Sql server you might want to start with understanding pages - the fundamental unit of storage (see http://msdn.microsoft.com/en-us/library/ms190969.aspx)
In terms of the performance implications of fixes vs variable storage types on performance there are a number of points to consider:
- Using variable length columns can improve performance as it allows more rows to fit on a single page, meaning fewer reads
- Using variable length columns requires special offset values, and the maintenance of these values requires a slight overhead, however this extra overhead is generally neglible.
- Another potential cost is the cost of increasing the size of a column when the page containing that row is nearly full
As you can see, the situation is rather complex - generally speaking however you can trust the database engine to be pretty good at dealing with variable data types and they should be the data type of choice when there may be a significant variance of the length of data held in a column.
At this point I'm also going to recommend the excellent book "Microsoft Sql Server 2008 Internals" for some more insight into how complex things like this really get!
这篇关于varchar是否会由于数据碎片而导致性能下降?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!