【调优实践】COUNT()OVER()改写优化
1. 问题描述
XX局点大量使用API获取报表数据进行分页展示,分页展示时需要同时获取到总条数,因此会在SQL中大量使用COUNT(1)OVER()窗口函数。
写法如下:SELECT *,COUNT(1)OVER() FROM 表/子查询 LIMIT XX OFFSET XX; 目前DWS执行逻辑中,会将所有语句GATHER到CN,并在CN上开窗计算,导致性能瓶颈。
如下SQL:
EXPLAIN PERFORMANCE
select cust_key,cust_acct,cust_id,count(1)over() from DMISC.DM_DIM_CBG_CUSTOMER_INFO_D limit 100 offset 0;
执行计划如下:
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs
----+-----------------------------------------------------------------+------------------+----------+----------+------------+-------------+----------+---------+---------+-----------
1 | -> Row Adapter | 1967.272 | 100 | 100 | | 39KB | | | 26 | 5.14
2 | -> Vector Limit | 1967.263 | 100 | 100 | | 8KB | | | 26 | 5.14
3 | -> Vector WindowAgg | 1967.259 | 1000 | 13372944 | | 1753687KB | | | 26 | 152384.64
4 | -> Vector Streaming (type: GATHER) | 424.184 | 13372949 | 13372949 | | 200KB | | | 26 | 148902.10
5 | -> CStore Scan on dmisc.dm_dim_cbg_customer_info_d | [14.669, 39.213] | 13372949 | 13372949 | | [6MB, 6MB] | 1MB | | 26 | 148583.60
2. SQL优化改写
如上所属计划,CN会GATHER所有数据到CN开窗计算,SQL执行将近2s。改写思路可以尝试减少CN GATHER数据量,将SQL改写成明细和汇总数据的关联,同时按照LIMIT OFFSET减少明细数据,改写如下:
EXPLAIN PERFORMANCE
SELECT t1.*,t2.cnt FROM
(select cust_key,cust_acct,cust_id from DMISC.DM_DIM_CBG_CUSTOMER_INFO_D limit 100 offset 0) t1
LEFT JOIN (SELECT COUNT(1) AS cnt FROM DMISC.DM_DIM_CBG_CUSTOMER_INFO_D) t2 ON 1=1;
改写后查询计划如下:
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs
----+--------------------------------------------------------------------------+------------------+----------+----------+------------+----------------+----------+---------+---------+-----------
1 | -> Row Adapter | 16.352 | 100 | 100 | | 39KB | | | 34 | 149289.98
2 | -> Vector Streaming (type: GATHER) | 16.335 | 100 | 100 | | 136KB | | | 34 | 149289.98
3 | -> Vector Nest Loop Left Join (4, 8) | [11.786, 11.786] | 100 | 100 | | [248KB, 248KB] | 1MB | | 34 | 149283.73
4 | -> Vector Limit | [0.093, 0.093] | 100 | 100 | | [8KB, 8KB] | 1MB | | 26 | 2.38
5 | -> Vector Streaming(type: BROADCAST) | [0.091, 0.091] | 100 | 4800 | | [536KB, 536KB] | 2MB | | 26 | 114.27
6 | -> Vector Limit | [0.156, 0.237] | 4800 | 4800 | | [16KB, 16KB] | 1MB | | 26 | 53.33
7 | -> CStore Scan on dmisc.dm_dim_cbg_customer_info_d | [0.107, 0.167] | 48000 | 13372949 | | [6MB, 6MB] | 1MB | | 26 | 148583.60
8 | -> Vector Materialize | [11.553, 11.553] | 101 | 48 | | [272KB, 272KB] | 16MB | [9,9] | 8 | 149280.32
9 | -> Vector Aggregate | [11.422, 11.422] | 1 | 48 | | [272KB, 272KB] | 1MB | | 8 | 149280.31
10 | -> Vector Streaming(type: BROADCAST) | [11.342, 11.342] | 48 | 48 | | [520KB, 520KB] | 2MB | | 8 | 149280.31
11 | -> Vector Aggregate | [4.441, 9.265] | 48 | 48 | | [272KB, 272KB] | 1MB | | 8 | 149280.12
12 | -> CStore Scan on dmisc.dm_dim_cbg_customer_info_d | [3.653, 8.151] | 13372949 | 13372949 | | [6MB, 6MB] | 1MB | | 0 | 148583.60
查询执行时间由2000ms优化至17ms,优化优化110多倍,性能优化效果明显。
3. 非相关子查询
上述优化改写方案大部分场景下性能提升效果明显,但是在个别场景下,因为nest loop存在,性能可能还不达标。那有什么办法可以消除NESTLOOP吗?答案是有的。
我们知道COUNT其实是个常量,那能不能先把COUNT求出来,然后让它作为查询的一个字段呢?
求出COUNT:SELECT COUNT(1) AS cnt FROM DMISC.DM_DIM_CBG_CUSTOMER_INFO_D
COUNT作为查询一个字段,改写如下:
EXPLAIN PERFORMANCE
select cust_key,cust_acct,cust_id,(SELECT COUNT(1) AS cnt FROM DMISC.DM_DIM_CBG_CUSTOMER_INFO_D)
from DMISC.DM_DIM_CBG_CUSTOMER_INFO_D limit 100 offset 0
改写后计划如下:
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs
----+-----------------------------------------------------------------------+----------------+----------+----------+------------+----------------+----------+---------+---------+-----------
1 | -> Row Adapter | 12.777 | 100 | 100 | | 39KB | | | 26 | 149288.53
2 | -> Vector Limit | 12.758 | 100 | 100 | | 8KB | | | 26 | 149288.53
3 | -> Vector Streaming (type: GATHER) | 12.755 | 100 | 4800 | | 136KB | | | 26 | 206.46
4 | -> Vector Limit | [6.456, 7.579] | 4800 | 4800 | | [16KB, 16KB] | 1MB | | 26 | 53.33
5 | -> CStore Scan on dmisc.dm_dim_cbg_customer_info_d | [6.433, 7.563] | 48000 | 13372949 | | [6MB, 6MB] | 1MB | | 26 | 148583.60
6 | -> Row Adapter [5, InitPlan 1 (returns $0)] | [6.242, 7.424] | 48 | 48 | | [24KB, 24KB] | 1MB | | 8 | 149280.31
7 | -> Vector Aggregate | [6.237, 7.419] | 48 | 48 | | [272KB, 272KB] | 1MB | | 8 | 149280.31
8 | -> Vector Streaming(type: BROADCAST) | [6.086, 7.310] | 2304 | 2304 | | [520KB, 520KB] | 2MB | | 8 | 149280.31
9 | -> Vector Aggregate | [4.458, 5.056] | 48 | 48 | | [272KB, 272KB] | 1MB | | 8 | 149280.12
10 | -> CStore Scan on dmisc.dm_dim_cbg_customer_info_d | [3.647, 4.153] | 13372949 | 13372949 | | [6MB, 6MB] | 1MB | | 0 | 148583.60
由上述执行计划可以看出,NESTLOOP已经消除。性能与第二种优化方法性能接近,在某些场景下该方法更具优势。
细心的读者可能已经发现,这种改法不就是非相关子查询嘛!是的,对于类似SQL都可以使用非相关子查询的方法进行改写优化。
比如以下SQL:
WITH TMP AS(
SELECT
case
when length('[“202309“]') = 6 then '[“202309“]' || '01'
WHEN length('[“202309“]') <> 8 THEN TO_CHAR(CURRENT_DATE, 'YYYYMMDD')
END AS V_DATE
from
DUAL
)
SELECT
BG_CODE,
BG_CN_NAME,
V_DATE
FROM
FROM
DMSALESW.DM_SALE_BUSI_DSCT_SUM_F C
LEFT JOIN TMP T ON 1 = 1;
TMP输出为一个常数,这种写法会导致NESTLOOP,按照非相关子查询改法如下:
SELECT
BG_CODE,
BG_CN_NAME,
(
SELECT
case
when length('[“202309“]') = 6 then '[“202309“]' || '01'
WHEN length('[“202309“]') <> 8 THEN TO_CHAR(CURRENT_DATE, 'YYYYMMDD')
END AS V_DATE
from
DUAL
) AS V_DATE
FROM
FROM
DMSALESW.DM_SALE_BUSI_DSCT_SUM_F C
LEFT JOIN TMP T ON 1 = 1;
- 点赞
- 收藏
- 关注作者
评论(0)