问题描述
我有一个500万行的数据库表.聚簇索引是自动增量标识"列. PK是代码生成的256字节VARCHAR
,它是URL的SHA256哈希,这是表上的非聚集索引.
I have a database table with 5 million rows. The clustered index is auto-increment identity column. There PK is a code generated 256 byte VARCHAR
which is a SHA256 hash of a URL, this is a non-clustered index on the table.
表如下:
CREATE TABLE [dbo].[store_image](
[imageSHAID] [nvarchar](256) NOT NULL,
[imageGUID] [uniqueidentifier] NOT NULL,
[imageURL] [nvarchar](2000) NOT NULL,
[showCount] [bigint] NOT NULL,
[imageURLIndex] AS (CONVERT([nvarchar](450),[imageURL],(0))),
[autoIncID] [bigint] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_imageSHAID] PRIMARY KEY NONCLUSTERED
(
[imageSHAID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [autoIncPK] ON [dbo].[store_image]
(
[autoIncID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
-
imageSHAID
是图像URL的SHA256哈希,例如" http://blah.com/image1.jpg ",将其哈希为256个长度的varchar .imageSHAID
is a SHA256 hash of an image URL e.g. "http://blah.com/image1.jpg", it is hashed into a varchar of 256 length.imageGUID
是代码生成的guid,其中我标识了图像(以后将用作索引,但现在我已将该列省略为索引)imageGUID
is a code generated guid in which I identify the image (it will be used as an index later, but for now I have omitted this column as an index)imageURL
是图像的完整URL(最多2000个字符)imageURL
is the full URL of the image (up to 2000 characters)showCount
是显示图像的次数,每次显示该特定图像时,该次数都会增加.showCount
is the number of times the image is shown, this is incremented each time this particular image is shown.imageURLIndex
是由450个字符限制的计算列,这使我可以在应选择的imageURL上进行文本搜索,它是可索引的(为简洁起见,再次省略了索引)imageURLIndex
is a computed column limited by 450 characters, this allows me to do text searches on the imageURL should I choose to, it is indexable (again index is omitted for brevity)autoIncID
是聚簇索引,应允许更快地插入数据.autoIncID
is the clustered index, should allow faster inserting of data.我会定期从临时表合并到
store_image
表中.临时表的结构如下(非常类似于store_image表):Periodically I merge from a temp table into the
store_image
table. The temp table structure is as follows (very similar to the store_image table):CREATE TABLE [dbo].[store_image_temp]( [imageSHAID] [nvarchar](256) NULL, [imageURL] [nvarchar](2000) NULL, [showCount] [bigint] NULL, ) ON [PRIMARY] GO
运行合并过程时,我使用以下代码将
DataTable
写入临时表:When the merge process is run, I write a
DataTable
to the temp table using the following code:using (SqlBulkCopy bulk = new SqlBulkCopy(storeConn, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.KeepNulls, null)) { bulk.DestinationTableName = "[dbo].[store_image_temp]"; bulk.WriteToServer(imageTableUpsetDataTable); }
然后我运行合并命令,通过基于
imageSHAID
从临时表进行合并来更新store_image
表中的showCount
.如果store_image
表中当前不存在该图像,则创建它:I then run the merge command to update the
showCount
in thestore_image
table by merging from the temp table based on theimageSHAID
. If the image doesn't currently exist in thestore_image
table, I create it:merge into store_image as Target using [dbo].[store_image_temp] as Source on Target.imageSHAID=Source.imageSHAID when matched then update set Target.showCount=Target.showCount+Source.showCount when not matched then insert values (Source.imageSHAID,NEWID(), Source.imageURL, Source.showCount);
我通常在任何一个合并过程中尝试将temp表中的2k-5k行合并到
store_image
表中.I'm typically trying to merge 2k-5k rows from the temp table to the
store_image
table at any one merge process.我曾经在SSD(仅连接SATA 1)上运行此数据库,并且运行速度非常快(不到200毫秒).我的SSD空间不足,因此将数据库交换到1TB 7200高速缓存旋转磁盘上,因为此后完成时间超过6-100秒(6000-100000MS).运行批量插入时,我可以看到磁盘活动约为1MB-2MB/秒,CPU使用率较低.
I used to run this DB on a SSD (only SATA 1 connected) and it was very fast (under 200 ms). I ran out of room on the SSD so I swapped the DB to a 1TB 7200 cache spinning disk, since then completion times are over 6-100 seconds (6000 - 100000MS). When the bulk insert is running I can see disk activity of around 1MB-2MB/sec, low CPU usage.
这是这种数据量的典型写入时间吗?对我来说似乎有点慢,是什么原因导致性能下降?当然,在为
imageSHAID
编制索引之后,我们应该期望比这更快的查找时间吗?Is this a typical write time for this amount of data? It seems a little slow to me, what is causing the slow performance? Surely with the
imageSHAID
being indexed we should expect quicker seek times than this?任何帮助将不胜感激.
感谢您的时间.
推荐答案
MERGE
中的UPDATE
子句将更新showCount
.这需要在聚集索引上进行键查找.Your
UPDATE
clause in theMERGE
updatesshowCount
. This requires a key lookup on the clustered index.但是,聚簇索引也被声明为非唯一.即使基础列是唯一的,这也将信息提供给优化器.
However, the clustered index is also declared non-unique. This gives information to the optimiser even though the underlying column is unique.
所以,我将进行这些更改
So, I'd make these changes
- 集群主键为
autoIncID
-
imageSHAID
上的当前PK是独立的唯一索引(不是约束),并为showCount
添加INCLUDE.唯一约束不能包含INCLUDEs
- the clustered primary key to be
autoIncID
- the current PK on
imageSHAID
to be a standalone unique index (not constraint) and add an INCLUDE forshowCount
. Unique constraints can't have INCLUDEs
更多观察结果:
- 您不需要
nvarchar
作为哈希或URL列.这些不是unicode. - 哈希值也是固定长度,因此可以为
char(64)
(对于SHA2-512). - 列的长度定义了要分配给查询的内存量.详情请参见:与varchar相比,varchar(500)是否有优势(8000)?
- you don't need
nvarchar
for the hash or URL columns. These are not unicode. - A hash is also fixed length so can be
char(64)
(for SHA2-512). - The length of a column defines how much memory to assign to the query. See this for more: is there an advantage to varchar(500) over varchar(8000)?
这篇关于SQL Server:合并性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
- 集群主键为