GaussDB(DWS)性能调优:row_number()场景性能优化

举报
譡里个檔 发表于 2022/12/20 15:06:49 2022/12/20
【摘要】 特殊场景的row_number()性能优化

很多业务场景里面有类似如下SQL实现获取每组的最大或者最小值的目的

SELECT 
    s.reconcile_method, s.trans_entity_bank_acc_id 
FROM (
    SELECT tms.trans_entity_bank_acc_id, tms.source AS reconcile_method        
    FROM (SELECT 
            tms.trans_entity_bank_acc_id, bank_st.source, 
            row_number() OVER(PARTITION BY tms.trans_entity_bank_acc_id ORDER BY bank_st.statement_date DESC) cnt 
        FROM dwrtretl.dwr_tr_tms_settle_f_tmp6 tms, 
        (SELECT bank_st.source, bank_st.bank_account_id, bank_st.statement_date   
        FROM dwnodi.dwr_fin_tr_bank_statement_fi bank_st
        WHERE 1 = 1 AND bank_st.del_flag = 'N') bank_st   
        WHERE tms.trans_entity_bank_acc_id = bank_st.bank_account_id   
        AND tms.settlement_date <= bank_st.statement_date
    ) tms       WHERE tms.cnt = 1
)s

如上的SQL语句在执行的时候会导致操作的数据集按照PARTITION BY和ORDER BY指定的组合字段做排序,具体计划如下

排序是一个高内存、低性能的操作动作,当数据量很大时,排序动作(Sort算子)会导致极大耗时。上述SQL的执行信息如下,其中Sort算子的耗时占比为(286227.739-83911.249)/287404.461 = 70.39%


因此我们一般应当避免大数据量(超千万级)的排序。当出现因为row_number导致执行执行慢的场景时,一般的优化思路有两个

  1. 减小row_number操作处理的数据量。适用场景一般是row_number操作的数据源重复度比较高的场景
  2. 通过等价改写的方式避免排序操作。适用场景一般是输出列和row_number的PARTITION BY和ORDER BY列的子集的场景

根据实际业务SQL的满足情况,优先检视建议方案1


场景一:通过提前去重操作,降低row_number操作的数据量

原始SQL

SELECT 
    s.reconcile_method, s.trans_entity_bank_acc_id 
FROM (
    SELECT tms.trans_entity_bank_acc_id, tms.source AS reconcile_method        
    FROM (SELECT 
            tms.trans_entity_bank_acc_id, bank_st.source, 
            row_number() OVER(PARTITION BY tms.trans_entity_bank_acc_id ORDER BY bank_st.statement_date DESC) cnt 
        FROM dwrtretl.dwr_tr_tms_settle_f_tmp6 tms, 
        (SELECT bank_st.source, bank_st.bank_account_id, bank_st.statement_date   
        FROM dwnodi.dwr_fin_tr_bank_statement_fi bank_st
        WHERE 1 = 1 AND bank_st.del_flag = 'N') bank_st   
        WHERE tms.trans_entity_bank_acc_id = bank_st.bank_account_id   
        AND tms.settlement_date <= bank_st.statement_date
    ) tms       WHERE tms.cnt = 1
)s



优化后SQL

SELECT 
    s.reconcile_method, s.trans_entity_bank_acc_id 
FROM (
    SELECT 
        trans_entity_bank_acc_id, source AS reconcile_method, 
        row_number() OVER(PARTITION BY trans_entity_bank_acc_id ORDER BY statement_date DESC) AS rn
    FROM (
        SELECT 
            tms.trans_entity_bank_acc_id, bank_st.source, bank_st.statement_date   
        FROM dwnodi.dwr_fin_tr_bank_statement_fi bank_st, dwrtretl.dwr_tr_tms_settle_f_tmp6 tms
        WHERE tms.trans_entity_bank_acc_id = bank_st.bank_account_id   
        AND tms.settlement_date <= bank_st.statement_date AND bank_st.del_flag = 'N'
        GROUP BY tms.trans_entity_bank_acc_id, bank_st.source, bank_st.statement_date
    )t
)s WHERE rn = 1

优化后的执行情况



场景二:反向关联,降低row_number操作的数据量

优化后SQL

WITH t AS( /* row_number要处理的结果集 */
    SELECT 
        tms.trans_entity_bank_acc_id, bank_st.source, bank_st.statement_date   
    FROM dwnodi.dwr_fin_tr_bank_statement_fi bank_st, dwrtretl.dwr_tr_tms_settle_f_tmp6 tms
    WHERE tms.trans_entity_bank_acc_id = bank_st.bank_account_id   
    AND tms.settlement_date <= bank_st.statement_date AND bank_st.del_flag = 'N'
    GROUP BY tms.trans_entity_bank_acc_id, bank_st.source, bank_st.statement_date
),

m AS( /* 求满足rn = 1的PARTITION BY列和ORDER BY列 */
    SELECT 
         trans_entity_bank_acc_id, /* PARTITION BY列 */
         max(statement_date) AS statement_date /* ORDER BY列 */
    FROM t
    GROUP BY trans_entity_bank_acc_id
)

SELECT 
    s.reconcile_method, s.trans_entity_bank_acc_id 
FROM (
    SELECT 
        trans_entity_bank_acc_id, source AS reconcile_method, 
        row_number() OVER(PARTITION BY trans_entity_bank_acc_id ORDER BY statement_date DESC) AS rn
    FROM ( /* 跟原始结果集反关联,获取row_number()要处理的最小结果集 */
        SELECT t.*
        FROM t
        INNER JOIN m ON m.trans_entity_bank_acc_id = t.trans_entity_bank_acc_id 
        AND m.statement_date = t.statement_date
    )f
)s WHERE rn = 1

优化后的执行情况


【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。