问题描述
参考SQL查询如何汇总学生记录日期? 我能够得到我想要的报告.
With reference to SQL Query how to summarize students record by date? I was able to get the report I wanted.
我被告知在现实世界中,学生表将有 3000 万条记录.我确实有 (StudentID, Date) 的索引.有什么建议可以提高性能,或者有没有更好的方法来构建报告?
I was told in real world the students table will have 30 Millions of records. I do have index on (StudentID, Date). Any suggestions to improve the performance or is there a better way to build the report ?
现在我有以下查询
;with cte as
(
select id,
studentid,
date,
'#'+subject+';'+grade+';'+convert(varchar(10), date, 101) report
from student
)
-- insert into studentreport
select distinct
studentid,
STUFF(
(SELECT cast(t2.report as varchar(50))
FROM cte t2
where c.StudentId = t2.StudentId
order by t2.date desc
FOR XML PATH (''))
, 1, 0, '') AS report
from cte c;
推荐答案
如果没有看到执行计划,真的不可能写出优化的 SQL 语句,所以我会提出建议.
Without seeing the execution plan, it's not really possible to write an optimized SQL statement so I'll make suggestions instead.
不要使用 cte,因为它们通常不能很好地处理需要大内存的查询(至少,以我的经验).相反,使用实体化/索引视图或工作表(可能是大型临时表)在真实表中暂存 cte 数据.然后执行第二个选择(在 cte 之后)将您的数据组合到一个有序列表中.
Don't use a cte as they often don't handle queries with large memory requires well (at least, in my experience). Instead, stage the cte data in a real table, either with a materialized/indexed view or with a working table (maybe a large temp table). Then execute the second select (after the cte) to combine your data in an ordered list.
对您的问题的评论数量表明您有一个大问题(或多个问题).您正在将高大数据和瘦数据(想想整数、datetime2 类型)转换为字符串中的有序列表.尝试考虑以可用的最小数据格式存储并在之后(或永远不会)操作成字符串.或者,认真考虑创建一个 XML 数据字段来替换报告"字段.
The number of comments to your question indicates that you have a large problem (or problems). You're converting tall and skinny data (think integers, datetime2 types) into ordered lists within a strings. Try to think instead in terms of storing in the smallest data formats available and manipulating into strings until afterward (or never). Alternatively, give serious thought into creating an XML data field to replace the 'report' field.
如果你能让它工作,这就是我会做的(包括一个没有索引的测试用例).您的里程可能会有所不同,但请尝试一下:
If you can make it work, this is what I would do (including a test case without indexes). Your mileage may vary, but give it a try:
create table #student (id int not null, studentid int not null, date datetime not null, subject varchar(40), grade varchar(40))
insert into #student (id,studentid,date,subject,grade)
select 1, 1, getdate(), 'history', 'A-' union all
select 2, 1, dateadd(d,1,getdate()), 'computer science', 'b' union all
select 3, 1, dateadd(d,2,getdate()), 'art', 'q' union all
--
select 1, 2, getdate() , 'something', 'F' union all
select 2, 2, dateadd(d,1,getdate()), 'genetics', 'e' union all
select 3, 2, dateadd(d,2,getdate()), 'art', 'D+' union all
--
select 1, 3, getdate() , 'memory loss', 'A-' union all
select 2, 3, dateadd(d,1,getdate()), 'creative writing', 'A-' union all
select 3, 3, dateadd(d,2,getdate()), 'history of asia 101', 'A-'
go
select studentid as studentid
,(select s2.date as '@date', s2.subject as '@subject', s2.grade as '@grade'
from #student s2 where s1.studentid = s2.studentid for xml path('report'), type) as 'reports'
from (select distinct studentid from #student) s1;
我不知道如何使这里的输出清晰易读,但结果集是 2 个字段.字段 1 是一个整数,字段 2 是 XML,每个报告一个节点.这仍然不如仅发送结果集那么理想,但每个学生 ID 至少有一个结果.
I don't know how to make the output legible on here, but the resultset is 2 fields. Field 1 is an integer, field 2 is XML with one node per report. This still isn't as ideal as just sending the resultset, but it is at least one result per studentid.
这篇关于如何提高此查询的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!