GaussDB(DWS)调优分享:merge into update场景优化

举报
Arrow0lf 发表于 2023/12/27 20:44:11 2023/12/27
【摘要】 merge into update场景优化

1. 问题描述

某用户通过merge into语法对目标表进行更新,语句内容如下,发现该sql执行效率很差,需要优化:

MERGE INTO PDM.PDM_DPS_ACCT_INFO T1
USING (SELECT CUST_NUM,
              DPS_CLS_BUZ_SIZE,
              ROW_NUMBER()     OVER(PARTITION BY CUST_NUM) RN
         FROM (SELECT CUST_NUM AS CUST_NUM,
                      (CASE WHEN INDV_BIZES_IND = 'Y' THEN  '1' ELSE '0' END ) AS DPS_CLS_BUZ_SIZE
                 FROM FDM.FDM_CUST_CORP_INFO
                WHERE DATA_DT = '20231029'
                  AND (RTL_CRDT_CUST_NUM IS NOT NULL OR NEW_CRDT_CUST_NUM IS NOT NULL)  )) T2
ON ( T2.RN = 1 AND T1.CUST_NUM = T2.CUST_NUM)
 WHEN MATCHED THEN
  UPDATE
     SET T1.DPS_CLS_BUZ_SIZE = T2.DPS_CLS_BUZ_SIZE
   WHERE T1.DATA_DT = 20230829
     AND T1.CORP_INDV_IND = 'C';

2. 原因分析

该语句执行计划如下:

                                                                                               QUERY PLAN                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  id |                             operation                             |   E-rows   | E-distinct | E-memory | E-width | E-costs 
 ----+-------------------------------------------------------------------+------------+------------+----------+---------+---------
   1 | ->  Streaming (type: GATHER)                                      |          1 |            |          |     210 | 21.49   
   2 |    ->  Merge on pdm.pdm_dps_acct_info t1                          |      72985 |            |          |     210 | 21.13   
   3 |       ->  Hash Join (4,5)                                         |  957453645 |            | 1MB      |     210 | 21.13   
   4 |          ->  Seq Scan on pdm.pdm_dps_acct_info t1                 | 1281253653 | 10         | 1MB      |     138 | 10.10   
   5 |          ->  Hash                                                 |        600 | 1          | 16MB     |      72 | 10.98   
   6 |             ->  Streaming(type: BROADCAST)                        |        600 |            | 2MB      |      72 | 10.98   
   7 |                ->  Subquery Scan on t2                            |          1 |            | 1MB      |      72 | 10.30   
   8 |                   ->  WindowAgg                                   |          1 |            | 16MB     |      64 | 10.28   
   9 |                      ->  Sort                                     |       1217 |            | 16MB     |      64 | 10.27   
  10 |                         ->  Streaming(type: REDISTRIBUTE)         |       1217 |            | 2MB      |      64 | 10.25   
  11 |                            ->  Seq Scan on fdm.fdm_cust_corp_info |       1217 |            | 1MB      |      64 | 10.12   
 
                                                                  Predicate Information (identified by plan id)                                                                  
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   2 --Merge on pdm.pdm_dps_acct_info t1
         Update Cond: (((t1.data_dt)::bigint = 20230829) AND (t1.corp_indv_ind = 'C'::text))
   3 --Hash Join (4,5)
         Hash Cond: (t1.cust_num = t2.cust_num)
   7 --Subquery Scan on t2
         Filter: (t2.rn = 1)
  11 --Seq Scan on fdm.fdm_cust_corp_info
         Filter: (((fdm_cust_corp_info.rtl_crdt_cust_num IS NOT NULL) OR (fdm_cust_corp_info.new_crdt_cust_num IS NOT NULL)) AND (fdm_cust_corp_info.data_dt = '20231029'::text))

从上述计划可以看到,t1表数据量较大,且Join完成的结果集也较大,而最终merge into的结果集数量较少,这是因为在update阶段,通过“Update Cond”过滤掉了很多数据,所以考虑这部分数据能否提前过滤掉。

根据语义分析,由于该语句仅做update,因此只会修改T1上匹配到的数据,无法匹配到的数据不做处理。

因此,可以等价改写,将Update Cond中的条件放到On条件里。(注意,如果merge into有insert该方式不等价)

3. SQL改写

等价改写方式如下:

MERGE INTO PDM.PDM_DPS_ACCT_INFO T1
USING (SELECT CUST_NUM,
              DPS_CLS_BUZ_SIZE,
              ROW_NUMBER()     OVER(PARTITION BY CUST_NUM) RN
         FROM (SELECT CUST_NUM AS CUST_NUM,
                      (CASE WHEN INDV_BIZES_IND = 'Y' THEN  '1' ELSE '0' END ) AS DPS_CLS_BUZ_SIZE
                 FROM FDM.FDM_CUST_CORP_INFO
                WHERE DATA_DT = '20231029'
                  AND (RTL_CRDT_CUST_NUM IS NOT NULL OR NEW_CRDT_CUST_NUM IS NOT NULL)  )) T2
ON ( T2.RN = 1 AND T1.CUST_NUM = T2.CUST_NUM AND T1.DATA_DT = 20230829 AND T1.CORP_INDV_IND = 'C')
 WHEN MATCHED THEN
  UPDATE
     SET T1.DPS_CLS_BUZ_SIZE = T2.DPS_CLS_BUZ_SIZE;

改写完成后,该sql性能提升10倍以上。

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。