GaussDB(DWS)性能调优:实时区数十亿级数据量关联下盘过大优化案例

举报
O泡果奶~ 发表于 2024/04/10 20:12:21 2024/04/10
【摘要】 本案例主要针对下盘大的SQL采用提前过滤加以优化(该方案适合于关联后数据量骤降的SQL,此时业务可以通过识别大表字段进行提前过滤,降低下盘)

1、【问题描述】

财经实时区环境月结期间集群压力过大,大量业务出现数十亿级表关联的情况,下盘较为严重,影响业务调度性能。
后续解决方案是:采用调整下盘限制参数的方法,设置下盘量超过30G就杀掉会话。
引起的后果:下盘大的SQL会一直失败。报错内容:canceling statement due to workload manager exception.
DETAIL: except rule [s_878ec2a0_spillsize] is meet condition:rule [spillsize] is over limit, current value is:30086.

2、【原始SQL】

SELECT 
       aa.tpye
      ,aa.s3
      ,aa.sa
      --,aa.sa2              废弃
      ,aa.cont
      ,substr(aa.cont,5,14) contract
      ,aa.trx_number
	  ,aa.trx_date
      ,aa.cmmt
      --,aa.dis               废弃
	  ,aa.inv_type 
      ,aa.sour
      ,aa.invoice_currency_code
      ,sum(aa.amount)       amount    
      ,sum(aa.fun_amount)   fun_amount
      ,aa.customer_name 
      ,aa.comp
      ,aa.reg
	  ,aa.acct
      ,aa.prd
      ,aa.ic   
      --,aa.org_con           废弃
      ,aa.gl_date
      --,aa.ship_date_actual  废弃
      --,aa.orde              废弃
      ,aa.cust
      ,aa.creation_date  
  FROM 
  (
       SELECT 
			/*+ rows (rctl # 500000000) */
           'MAR'                                  as tpye
			,''                                     as s3
             ,rct.salesperson_name                   as sa
             ,rct.contract_number                    as cont
             ,rct.invoice_number                     as trx_number
             ,rct.invoice_date                       as trx_date
             ,rct.comments                           as cmmt
             ,tt.invoice_type_name                   as inv_type
             ,rct.source_type                        as sour
             ,rct.currency_code                      as invoice_currency_code
             ,rctlgd.distribution_amount             as amount
             ,rctlgd.functional_distribution_amount  as fun_amount
             ,hp.bp_name                             as customer_name
             ,rct.unit_code                          as comp
             ,mah.coa_geo_pc_code                    as reg
             ,mah.coa_account_code                   as acct
			 ,mah.coa_product_code                   as prd
             ,mah.coa_ic_code                        as ic
             ,to_char(rct.gl_date,'DD-MON-YY')    as gl_date
             ,hca.account_number                     as cust
             ,to_char(rct.creation_date,'DD-MON-YY') as creation_date
         FROM SDIFIN.OGG_HAR_INVOICE_HEADER_8660_VI  rct
			 ,SDIFIN.OGG_HAR_INVOICE_LINE_8660_VI    rctl
			 ,SDIFIN.OGG_HAR_INVOICE_DIS_8660_VI     rctlgd
			 ,SDIFIN.OGG_HAH_AE_LINE_LINK_8663_VI lk -- SDI.ogg_hah_ae_line_link_t_8663     lk
			 ,SDIFIN.OGG_HAH_AE_LINE_8663_VI mah -- SDI.ogg_hah_ae_line_t_8663          l
             ,SDIIT.OGG_HAC_MD_ENTITY_MANAGET_7050  hca
             ,SDICOMMON.OGG_HAC_MD_LEGAL_ENTITY_T_7050  hp
             ,SDIFIN.OGG_HAR_INVOICE_TYPE_T_8660     tt
        WHERE 1 = 1
  --AND rct.delete_flag<>'Y'
  and NVL(rct.DELETE_FLAG,'N') = 'N'
  --and ltrim(rtrim(rct.delete_flag)) <>'Y'
  --AND rctl.delete_flag<>'Y'
  and NVL(rctl.DELETE_FLAG,'N') = 'N'
  --and ltrim(rtrim(rctl.delete_flag)) <>'Y'
          AND rct.invoice_header_id = rctl.invoice_header_id  
          AND rctl.invoice_line_id = rctlgd.source_reference_id
          AND rct.account_number = hca.account_number
          AND hca.bp_id = hp.bp_id
    AND rct.invoice_type_id = tt.invoice_type_id
          AND rct.unit_code = tt.unit_code    
          AND tt.invoice_type_name NOT LIKE '%ACC' 
          AND tt.invoice_type_name NOT LIKE '%ACC(ACI)'
          AND to_char(rctlgd.INVOICE_DISTRIBUTION_ID) = lk.source_distribution_id_char_1
    AND mah.ae_line_num = lk.ae_line_num
    AND mah.ae_header_id = lk.ae_header_id
        AND   ( rct.UNIT_CODE  IN  ('0021') )    --rct.UNIT_CODE
          AND  rct.gl_date between '2024-03-01 00:00:00' and '2024-03-31 23:59:59'   --rct.gl_date
          AND  1=1      --rct.contract_number
          AND   ( mah.coa_account_code  IN  ('2610000','2620000','2680100','4010000','4040000','4070000','9990455') )       --mah.coa_account_code 
   UNION ALL
   SELECT 
           'HRC'                                as tpye
    ,h.attribute4                         as s3
             ,h.salesperson_name                   as sa
             ,h.contract_number                    as cont
             ,''                                   as trx_number
             ,l.transaction_date                   as trx_date
             ,''                                   as cmmt
             ,l.account_business_type_code         as inv_type
             ,''                                   as sour
             ,dis.transaction_currency_code        as invoice_currency_code
             ,dis.transaction_amount               as amount
             ,dis.functional_amount                as fun_amount
             ,hp.bp_name                           as customer_name
             ,h.company_code                       as comp
             ,mah.coa_geo_pc_code                  as reg
             ,mah.coa_account_code                 as acct
    ,mah.coa_product_code                 as prd
             ,mah.coa_ic_code                      as ic
             ,to_char(dis.gl_date,'DD-MON-YY')     as gl_date
             ,hca.account_number                   as cust
             ,to_char(h.creation_date,'DD-MON-YY') as creation_date
   FROM SDIFIN.OGG_RCR_REV_COST_ACC_H_8659_VI  h/*ar.ra_customer_trx_all*/
       ,SDIFIN.OGG_RCR_REV_COST_ACC_L_8659_VI  l/*ar.ra_customer_trx_lines_all*/
       ,SDIFIN.OGG_RCR_REV_COST_DIS_8659_VI    dis/*ar.ra_cust_trx_line_gl_dist_all*/
             -- ,SDI.OGG_RCR_REV_COST_EVENT_8659_VI  event/*增加*/
    ,SDIIT.OGG_HAC_MD_ENTITY_MANAGET_7050  hca/*ar.hz_cust_accounts*/   
             ,SDICOMMON.OGG_HAC_MD_LEGAL_ENTITY_T_7050  hp/*ar.hz_parties*/
    ,SDIFIN.OGG_HAH_AE_LINE_LINK_8663_VI lk -- SDI.ogg_hah_ae_line_link_t_8663     lk
    ,SDIFIN.OGG_HAH_AE_LINE_8663_VI mah -- SDI.ogg_hah_ae_line_t_8663          l 
  WHERE 1 = 1
          and h.rev_cost_header_id = l.rev_cost_header_id
          and l.rev_cost_line_id = dis.rev_cost_line_id 
    -- and dis.rev_cost_event_id = event.rev_cost_event_id
    -- and event.accounting_status = '6'
    -- and event.event_status_code = '5'
          AND h.account_number = hca.account_number
          AND hca.bp_id = hp.bp_id
    AND lk.source_distribution_id_char_1 = to_char(dis.rev_cost_distribution_id)
    AND mah.ae_line_num = lk.ae_line_num
    AND mah.ae_header_id = lk.ae_header_id
    AND   ( h.COMPANY_CODE  IN  ('0021') )    --h.COMPANY_CODE
    AND  dis.gl_date between '2024-03-01 00:00:00' and '2024-03-31 23:59:59'   --dis.gl_date
    AND  1=1      --h.contract_number
    AND   ( mah.coa_account_code  IN  ('2610000','2620000','2680100','4010000','4040000','4070000','9990455') )       --mah.coa_account_code
) aa
 GROUP BY aa.tpye
      ,aa.s3
      ,aa.sa
      ,aa.cont
      ,aa.trx_number
	  ,aa.trx_date
      ,aa.cmmt
      ,aa.inv_type
      ,aa.sour
      ,aa.invoice_currency_code
	  ,aa.customer_name
      ,aa.comp
      ,aa.reg
      ,aa.acct
      ,aa.prd
      ,aa.ic
      ,aa.gl_date
      ,aa.cust
   ,aa.creation_date
 HAVING (sum(aa.amount) = 0 and sum(aa.fun_amount) <> 0)
        or sum(aa.amount) <> 0

3、【性能分析】

image.png
上图中可以看出,SQL执行时间达到了1200s+,lk子查询数据量达到了118亿,表扫描时间达到了300s+,同时存在非常严重的下盘。但是关联后数据量只有300万,与业务证实后确认的确不需要扫描所有数据(即可以利用某个字段进行提前过滤,降低关联数据量,减少下盘)。
例如本案例中,与业务求证后利用了lk中的application_code字段提前过滤,优化后SQL如下所示(取重点部分):

with tmp_mah as (
 	select mah.coa_geo_pc_code                    
	       ,mah.coa_account_code                   
				 ,mah.coa_product_code                  
	       ,mah.coa_ic_code                        
         ,mah.ae_line_num
         ,mah.ae_header_id
         ,lk.source_distribution_id_char_1
         ,LK.application_code
    from SDIFIN.OGG_HAH_AE_LINE_8663_VI mah,
         SDIFIN.OGG_HAH_AE_LINE_LINK_8663_VI lk
   where ( mah.coa_account_code  IN  ('2610000','2620000','2680100','4010000','4040000','4070000','9990455') ) 
     AND mah.ae_line_num = lk.ae_line_num
		 AND mah.ae_header_id = lk.ae_header_id 
		 AND mah.coa_company_code  = '0021'
		 AND LK.application_code IN ('AR','RMC')
)
	SELECT ... 
	FROM SDIFIN.OGG_HAR_INVOICE_HEADER_8660_VI  rct
						 ,SDIFIN.OGG_HAR_INVOICE_LINE_8660_VI    rctl
						 ,SDIFIN.OGG_HAR_INVOICE_DIS_8660_VI     rctlgd
						 ,tmp_mah mah
             ,SDIIT.OGG_HAC_MD_ENTITY_MANAGET_7050  hca
             ,SDICOMMON.OGG_HAC_MD_LEGAL_ENTITY_T_7050  hp
             ,SDIFIN.OGG_HAR_INVOICE_TYPE_T_8660     tt
        WHERE 1 = 1
        ...
        AND mah.application_code='AR'
        UNION ALL
   SELECT ...
   		  WHERE 1 = 1
        ...
        AND mah.application_code='RMC'

优化后,SQL执行时间降低为375s+,相比优化前性能提升了接近三倍,从执行计划中可以看出,利用lk字段提前过滤,减少了表扫描以及关联数据量,提高了SQL的执行效率。
image.png

4、【总结】

当前实时区存在较多大数据量的表或者视图,业务在使用时如果不进行提前过滤,就会导致无效数据过多,影响调度效率;而且,在月结或者年结这种高负载时间段,大量此类SQL并发执行,也会严重影响集群性能。因此,需要业务针对这种表或者视图加以提前过滤进行优化。

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。