GaussDB(DWS)调优分享:merge into update场景优化
【摘要】 merge into update场景优化
1. 问题描述
某用户通过merge into语法对目标表进行更新,语句内容如下,发现该sql执行效率很差,需要优化:
MERGE INTO PDM.PDM_DPS_ACCT_INFO T1
USING (SELECT CUST_NUM,
DPS_CLS_BUZ_SIZE,
ROW_NUMBER() OVER(PARTITION BY CUST_NUM) RN
FROM (SELECT CUST_NUM AS CUST_NUM,
(CASE WHEN INDV_BIZES_IND = 'Y' THEN '1' ELSE '0' END ) AS DPS_CLS_BUZ_SIZE
FROM FDM.FDM_CUST_CORP_INFO
WHERE DATA_DT = '20231029'
AND (RTL_CRDT_CUST_NUM IS NOT NULL OR NEW_CRDT_CUST_NUM IS NOT NULL) )) T2
ON ( T2.RN = 1 AND T1.CUST_NUM = T2.CUST_NUM)
WHEN MATCHED THEN
UPDATE
SET T1.DPS_CLS_BUZ_SIZE = T2.DPS_CLS_BUZ_SIZE
WHERE T1.DATA_DT = 20230829
AND T1.CORP_INDV_IND = 'C';
2. 原因分析
该语句执行计划如下:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | operation | E-rows | E-distinct | E-memory | E-width | E-costs
----+-------------------------------------------------------------------+------------+------------+----------+---------+---------
1 | -> Streaming (type: GATHER) | 1 | | | 210 | 21.49
2 | -> Merge on pdm.pdm_dps_acct_info t1 | 72985 | | | 210 | 21.13
3 | -> Hash Join (4,5) | 957453645 | | 1MB | 210 | 21.13
4 | -> Seq Scan on pdm.pdm_dps_acct_info t1 | 1281253653 | 10 | 1MB | 138 | 10.10
5 | -> Hash | 600 | 1 | 16MB | 72 | 10.98
6 | -> Streaming(type: BROADCAST) | 600 | | 2MB | 72 | 10.98
7 | -> Subquery Scan on t2 | 1 | | 1MB | 72 | 10.30
8 | -> WindowAgg | 1 | | 16MB | 64 | 10.28
9 | -> Sort | 1217 | | 16MB | 64 | 10.27
10 | -> Streaming(type: REDISTRIBUTE) | 1217 | | 2MB | 64 | 10.25
11 | -> Seq Scan on fdm.fdm_cust_corp_info | 1217 | | 1MB | 64 | 10.12
Predicate Information (identified by plan id)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 --Merge on pdm.pdm_dps_acct_info t1
Update Cond: (((t1.data_dt)::bigint = 20230829) AND (t1.corp_indv_ind = 'C'::text))
3 --Hash Join (4,5)
Hash Cond: (t1.cust_num = t2.cust_num)
7 --Subquery Scan on t2
Filter: (t2.rn = 1)
11 --Seq Scan on fdm.fdm_cust_corp_info
Filter: (((fdm_cust_corp_info.rtl_crdt_cust_num IS NOT NULL) OR (fdm_cust_corp_info.new_crdt_cust_num IS NOT NULL)) AND (fdm_cust_corp_info.data_dt = '20231029'::text))
从上述计划可以看到,t1表数据量较大,且Join完成的结果集也较大,而最终merge into的结果集数量较少,这是因为在update阶段,通过“Update Cond”过滤掉了很多数据,所以考虑这部分数据能否提前过滤掉。
根据语义分析,由于该语句仅做update,因此只会修改T1上匹配到的数据,无法匹配到的数据不做处理。
因此,可以等价改写,将Update Cond中的条件放到On条件里。(注意,如果merge into有insert该方式不等价)
3. SQL改写
等价改写方式如下:
MERGE INTO PDM.PDM_DPS_ACCT_INFO T1
USING (SELECT CUST_NUM,
DPS_CLS_BUZ_SIZE,
ROW_NUMBER() OVER(PARTITION BY CUST_NUM) RN
FROM (SELECT CUST_NUM AS CUST_NUM,
(CASE WHEN INDV_BIZES_IND = 'Y' THEN '1' ELSE '0' END ) AS DPS_CLS_BUZ_SIZE
FROM FDM.FDM_CUST_CORP_INFO
WHERE DATA_DT = '20231029'
AND (RTL_CRDT_CUST_NUM IS NOT NULL OR NEW_CRDT_CUST_NUM IS NOT NULL) )) T2
ON ( T2.RN = 1 AND T1.CUST_NUM = T2.CUST_NUM AND T1.DATA_DT = 20230829 AND T1.CORP_INDV_IND = 'C')
WHEN MATCHED THEN
UPDATE
SET T1.DPS_CLS_BUZ_SIZE = T2.DPS_CLS_BUZ_SIZE;
改写完成后,该sql性能提升10倍以上。
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)