1. ORACLE 19C LISTAGG函数功能增强
1.1 LISTAGG函数简介
LISTAGG函数是11g R2引入用于合并字符串(列转行),可替换wmsys.wm_concat功能,而且效率更好。wm_concat函数是自定义聚集函数,而且是undocument的,从12C开始已经去掉,从11g R2开始,有字符串合并的需求,最好使用LISTAGG替代,以获得更好的性能。
但是,LISTAGG函数在11g R2中有个缺点,就是不能直接DISTINCT,在ORACLE 19C中,ORACLE给它增加了DISTINCT功能,这样可以剔除重复的字符串合并。19C的完整LISTAGG语法如下:
从语法图上可以看出,LISTAGG函数可以是普通的组函数,也可以用于分析函数,并且12C开始增加了OVERFLOW语法,用于字符串过长的处理。
1.2 使用LISTAGG函数实现字符串合并
如下例所示:
需求:对emp表,按照部门分组,按逗号合并部门员工名。
使用LISTAGG实现如下:
select deptno, listagg(ename,',') within group(order by deptno)as enames
from emp
group by deptno
order by deptno;
DEPTNO ENAMES
------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20ADAMS,FORD,JONES,SCOTT,SMITH
30ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
下面给DEPTNO=10的部门插入2个重复名字的员工,如下所示:
INSERT INTO empVALUES(8000,'KING','ANALYST',7782,to_date('1983-1-1','yyyy-mm-dd'),2000,NULL,10);
INSERT INTO empVALUES (9000,'KING','MANADER',7782,to_date('1984-5-1','yyyy-mm-dd'),2500,NULL,10);
COMMIT;
再次查询,发现有重复的数据:
SQL> selectdeptno, listagg(ename,',') within group(order by deptno) as enames
2 from emp
3 group by deptno
4 order by deptno;
DEPTNO ENAMES
--------------------------------------------------------------------------------------
10 CLARK,KING,KING,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
一般遇到这种有重复数据的情况,需要剔除重复数据,在19C之前,需要先剔除重复数据:
--红色部分使用分析函数剔重,当然这里直接distinct也可以,结果与最前面的一致
select deptno, listagg(ename, ',')within group(order by deptno) as enames
from (select deptno,
ename,
row_number() over(partition by deptno, enameorder by empno) rn
from emp)
where rn = 1
group by deptno
order by deptno;
或者直接DISTINCT :
select deptno, listagg(ename, ',') within group(order by deptno)as enames
from (select distinctdeptno,
ename
from emp)
group by deptno
order by deptno;
DEPTNO ENAMES
-------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
到了19C,LISTAGG提供了直接DISTINCT的功能,可以很简单地实现剔除重复数据,然后合并,如下所示:
selectdeptno, listagg(distinct ename,',') within group(order by deptno) as enames
from emp
group by deptno
order by deptno;
DEPTNO ENAMES
-------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
默认情况下包含所有数据(ALL不用写默认),也就是LISTAGG(ALLcolumn),如下所示:
select deptno, listagg(all ename,',') within group(order by deptno) asenames
from emp
group by deptno
order by deptno;
1.3 WM_CONCAT函数与LISTAGG函数比较
如果使用wm_concat,则是:
select deptno, wm_concat(distinct ename)as enames
from emp
group by deptno
order by deptno;
DEPTNO ENAMES
--------------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
wm_concat这个undocument函数可以直接distinct,但是它与LISTAGG比缺点很明显,一是性能差,而是undocument函数,比较如下:
1)wm_concat可以distinct去除重复的(用于分析函数,distinct不能带orderby),19C之前的listagg不可以,可以先剔除重复,然后做listagg,19C之后的listagg可以distinct。
2)wm_concat不能保证排序,listagg可以保证排序.
3.WMSYS.WM_CONCAT是undocument的函数,最好不要使用,要使用也用自定义分析函数
1.4 使用ON OVERFLOW处理字符串长度溢出问题
LISTAGG函数在12.2开始,合并长度最多32767字节,依赖于MAX_STRING_SIZE参数,如下所示:
1) 如果MAX_STRING_SIZE=EXTEND,则对于VARCHAR2和RAW类型,最多返回32767字节
2) 如果MAX_STRING_SIZE=STANDARD,则对于VARCHAR2最多4000字节,对于RAW类型最多2000字节
那么在合并的字符串超过限制,溢出时,默认报错,但是12.2引入了ON OVERFLOW可以截断处理,如下:
ON OVERFLOW TRUNCATE默认对溢出数据后面用…(count),如下所示:
SELECT deptno,LISTAGG(ename, ',' ON OVERFLOW TRUNCATE) WITHIN GROUP(ORDER BY deptno) AS enames
FROM (select deptno,ename from emp,(select levelfrom dual connect by level
GROUP BY deptno
ORDER BY deptno;
DEPTNO ENAMES
----------------------------------------------------------------------------------------------------------
10CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLAR
省略
ARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,
CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,...(4334)
20ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAM
S,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,AD
AMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,
省略
ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,...(4334)
30ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLE
N,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,AL
LEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,
省略
LEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,
ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,...(5333)
其他用法如下:
--1.改变省略的格式,换成~~~
SELECT deptno, LISTAGG(ename, ',' ON OVERFLOWTRUNCATE '~~~') WITHIN GROUP (ORDER BY deptno) AS enames
FROM (selectdeptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
--2.通过WINTHOUTCOUNT省略掉计数
SELECT deptno, LISTAGG(ename, ',' ON OVERFLOWTRUNCATE WITHOUT COUNT) WITHIN GROUP (ORDER BY deptno) AS enames
FROM (selectdeptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
--3.默认格式ONOVERFLOW ERROW,溢出则报错ORA-01489: result of string concatenation istoo long
SELECT deptno, LISTAGG(ename, ',' ON OVERFLOW ERROR)WITHIN GROUP (ORDER BY deptno) AS enames
FROM (selectdeptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
1.5 使用XMLAGG返回CLOB处理字符串合并超长需求
针对超过字符串长度溢出的问题,如果要完整显示,可以使用如下方式:
1) 自定义wm_concat返回值类型为CLOB
2) 使用XMLAGG函数获取CLOB值
下面使用XMLAGG函数处理超长合并字符串问题,如下所示:
SELECT deptno, RTRIM(xmlagg(xmlelement(c, ename || ',')
ORDER BY deptno).extract('//text()').getclobval(),
',') ASenames
FROM (select deptno,ename from emp,(select level from dual connect bylevel<1000))
GROUP BY deptno
ORDER BY deptno;
总结:ORACLE19C针对LISTAGG增加了很多改进,在以后有字符串合并需求(列转行)的时候,要优先使用LISTAGG,而不是WM_CONCAT,对于自定义聚集函数,性能较差,而且功能也没有LISTAGG强大。