GaussDB(DWS)性能调优:子查询性能优化
相关子查询分为相关子查询(Correlated-SubQuery)和非相关子查询(None-Correlated SubQuery)。非相关子查询的执行不依赖于外层父查询的任何属性值,可独自求解。相关子查询的执行依赖于外层父查询的一些属性值,子查询与外部父查询经常存在关联条件,在执行上父查询的每一条输出都需要把相关的参数迭代到标量子查询中计算一次,这种反复的迭代类似于nestloop的执行方式,性能较差。
使用另一个SELECT语句(假设为A语句)查询的结果作为当前SELECT语句的一个输出列时,我们称A语句为标量子查询。
此处我们讨论聚焦于相关标量子查询,针对这类子查询,常用的优化方案是把子查询提升成为跟父表的join操作,进而提升计算性能;在子查询为单表查询并且表为复制表的场景,我们也可以通过索引的方式提升性能。
场景1) 标量子查询为复制表查询,通过索引的方式进行优化
通过使用标量子查询,把业务信息中的一些ID信息转换为可读信息,这是一样比较常见的SQL写法(这些写法蕴含了这样一种信息:子查询中的表在关联字段+where条件的记录满足唯一约束。比如下面查询中表fasp_t_pubexpfunc的字段(year,province,code)实质上满足唯一约束,否则标量子查询会输出两条记录,导致执行报错)。
SELECT
b.billno,
(SELECT name FROM fasp_t_pubexpfunc e WHERE e.province =b.province AND e.code=substr(b.expfunccode,1,3)),
(SELECT name FROM fasp_t_pubexpfunc e WHERE e.province =b.province AND e.code=substr(b.expfunccode,1,5))
FROM pay_t_voucher_bill b
LEFT JOIN fasp_t_pubfundtype a ON substr(b.fundtypecode,1,6)=a.code
WHERE b.guid in ('20200115235821153101854042','20200115235821135101854041')
;
-- ## fasp_t_pubexpfunc为复制表
-- 优化手段
CREATE INDEX idx_fasp_t_pubexpfunc_pcy ON fasp_t_pubexpfunc(code, province);
场景2)标量子查询为复杂查询或者hash表查询,通过子查询提升的方式进行优化
如果标量子查询中存在关联查询、聚合函数等操作或者标量子查询中表的是Hash表的时候,一般通过子查询提升的方式进行性能优化。
需要注意的是标量子查询有可能存在关联条件无法命中的场景,此场景下标量子查询输出为NULL,因此在提升后的子查询与父查询需要使用LEFT JOIN
SELECT
f.clearbankguid,
(SELECT COALESCE(sum(vou1.amt), 0) AS "nvl" FROM budget_t_voucher vou1 WHERE vou1.toctrlid = gl.sumguid AND vou1.isend = 1) AS takebackamt,
(SELECT count(1) AS count FROM boss_t_sys_relation_detail b, boss_t_sys_relation a WHERE a.guid = b.relationguid
AND a.prielementcode = 'GOVEXPECO' AND b.prielementvalue = f.govexpecoguid AND a.year = gl.year AND a.province = gl.province) AS isgovexpecorelation
FROM boss_t_glbalance gl, fasp_t_glctrl122299 f
WHERE f.guid = gl.sumguid AND f.year = gl.year
AND f.province = gl.province
;
-- 当 gl 和 f的关联结果集比较大的时候
-- 优化方式是等价改写为如下SQL
WITH vou1 AS (
SELECT
COALESCE(sum(vou1.amt), 0) AS takebackamt , toctrlid
FROM budget_t_voucher vou1
WHERE vou1.isend = 1
GROUP BY toctrlid
),
boss_t_sys AS(
SELECT
count(1) AS isgovexpecorelation, a.year, a.province,b.prielementvalue
FROM boss_t_sys_relation_detail b
INNER JOIN boss_t_sys_relation a ON a.guid = b.relationguid
WHERE a.prielementcode = 'GOVEXPECO'
GROUP BY a.year, a.province,b.prielementvalue
)
SELECT
f.clearbankguid,
vou1.takebackamt,
a.isgovexpecorelation
FROM boss_t_glbalance gl
INNER JOIN fasp_t_glctrl122299 f ON (f.guid = gl.sumguid AND f.year = gl.year AND f.province = gl.province)
LEFT JOIN vou1 ON (vou1.toctrlid = gl.sumguid)
LEFT JOIN boss_t_sys a ON (a.year = gl.year AND a.province = gl.province AND a.prielementvalue = f.govexpecoguid)
;
-- 当 gl 和 f的关联结果集比较小的时候
-- 优化方式是等价改写为如下SQL
WITH vou1 AS(
SELECT
COALESCE(sum(vou1.amt), 0) AS "nvl",
FROM budget_t_voucher vou1, boss_t_glbalance gl, fasp_t_glctrl122299 f
WHERE vou1.toctrlid = gl.sumguid AND vou1.isend = 1
AND f.guid = gl.sumguid AND f.year = gl.year AND f.province = gl.province
GROUP BY vou1.toctrlid
),
boss_t_sys AS(
SELECT
count(1) AS count, relationguid
FROM boss_t_sys_relation_detail b, boss_t_sys_relation a, boss_t_glbalance gl, fasp_t_glctrl122299 f
WHERE a.guid = b.relationguid AND AND a.prielementcode = 'GOVEXPECO' AND b.prielementvalue = f.govexpecoguid AND a.year = gl.year AND a.province = gl.province
AND f.guid = gl.sumguid AND f.year = gl.year AND f.province = gl.province
GROUP BY f.govexpecoguid, gl.year, gl.province
)
SELECT
f.clearbankguid,
(SELECT COALESCE(sum(vou1.amt), 0) AS "nvl" FROM budget_t_voucher vou1 WHERE vou1.toctrlid = gl.sumguid AND vou1.isend = 1) AS takebackamt,
(SELECT count(1) AS count FROM boss_t_sys_relation_detail b, boss_t_sys_relation a WHERE a.guid = b.relationguid
AND a.prielementcode = 'GOVEXPECO' AND b.prielementvalue = f.govexpecoguid AND a.year = gl.year AND a.province = gl.province) AS isgovexpecorelation
FROM boss_t_glbalance gl
INNER JOIN fasp_t_glctrl122299 f ON f.guid = gl.sumguid AND f.year = gl.year AND f.province = gl.province
LEFT JOIN vou1 ON vou1.toctrlid = gl.sumguid
LEFT JOIN boss_t_sys a ON a.prielementvalue = f.govexpecoguid AND a.year = gl.year AND a.province = gl.province
;
场景3)标量子查询为LIMIT 1查询,通过子查询汇聚后提升的方式进行优化
如果子查询中含有LIMIT 1,在改写的时候为了避免多条匹配导致的结果集膨胀,我们需要提前对子查询在相关字段上做聚合操作,把多条结果聚合为一条,然后再按照正常的子查询提升流程进行SQL改写。
注意:此种场景需要业务层保证原先子查询返回值的稳定性,即如果子查询不加LIMIT 1,对于任何关联参数,子查询返回的多条结果的值是一样,这样不管子查询输出如何排序,LIMIT 1的值都是一样的
SELECT
a.guid,
(SELECT tt.name FROM boss_t_fb_refmodel_detail tt WHERE a.agencytype = tt.code AND tt.billguid = '9eb19ada9f994cf5ab74a5cf9289a752' LIMIT 1) AS agencytype,
(SELECT tt.name FROM boss_t_fb_refmodel_detail tt WHERE a.agencykind = tt.code AND tt.billguid = '3208d355701841d385e1e1f111f33bb2' LIMIT 1) AS agencykind,
(SELECT tt.name FROM boss_t_fb_refmodel_detail tt WHERE a.agencylvl = tt.code AND tt.billguid = '29b7f90307c146869dc18d43e97fe9b5' LIMIT 1) AS agencylvl
FROM boss_t_fb_agency a
;
-- 优化方式是等价改写为如下SQL
SELECT
a.guid,
tt1.name AS agencytype,
tt2.name AS agencykind,
tt3.name AS agencylvl
FROM boss_t_fb_agency a
LEFT JOIN(
SELECT tt.name, tt.code
FROM boss_t_fb_refmodel_detail tt
WHERE tt.billguid = '9eb19ada9f994cf5ab74a5cf9289a752'
GROUP BY tt.name, tt.code
)tt1 ON a.agencytype = tt1.code
LEFT JOIN(
SELECT tt.name, tt.code
FROM boss_t_fb_refmodel_detail tt
WHERE tt.billguid = '3208d355701841d385e1e1f111f33bb2'
GROUP BY tt.name, tt.code
)tt2 ON a.agencykind = tt2.code
LEFT JOIN(
SELECT tt.name, tt.code
FROM boss_t_fb_refmodel_detail tt
WHERE tt.billguid = '29b7f90307c146869dc18d43e97fe9b5'
GROUP BY tt.name, tt.code
)tt3 ON a.agencylvl = tt3.code
;
场景4)NOT IN子查询
对于这类子查询,GaussDB支持子查询的自动提升,但是因为NULL值的处理,通常场景下提升后只能使用NestLoop的JOIN方式,此执行方式相当低效
SELECT m.prjcode, m.prjname, m.province, m.year
FROM boss_t_lb_specific_main m
WHERE prjcode NOT IN (SELECT prjcode FROM boss_t_lb_ach_send_log);
如上SQL语句的执行计划如下
我们 可以看到提升之后,两表走了nestloop的执行计划,并且JOIN条件为一个OR语句。针对这种场景,需要详细分析从业务和数据上看表boss_t_lb_ach_send_log的字段prjcode是否可能存在NULL值,如果不存在则给字段prjcode增加NOT NULL约束,增加NOT NULL约束之后执行计划走了更优的HashJoin
场景5)OR 子查询
对于这类子查询(fasp_t_userorg为复制表),因为OR语句的限制,导致只能走低效的Semi Join
select * from vw_boss_t_pay_voucher_report_shibo a
where 1=1
and province =(610000)
and year = (2020)
and createtime >= '2020-01-01'
and createtime <= '2020-09-30'
and paytime >= '2020-01-01'
and paytime <= '2020-09-30'
and cleartime >= '2020-01-01'
and cleartime <= '2020-09-30'
and exists(select 1 from fasp_t_userorg org where (org.orgguid = a.agencyguid or org.orgguid = a.departmentguid) and org.userguid = '78C4B0693CE64AA381C58EB171FDCCF5' and org.province = '610000' and org.year = '2020')
order by billno,agencycode,fundtypecode,createtime;
如上SQL语句的执行计划如下
优化策略为
1) 如果表t输出的列数较少,把子查询中的OR语句拆成两条语句的UNION ALL,这样让主查询的Param可以下推到子查询中,子查询使用IndexScan加速查询
(注:理论上即便不修改,子查询应当也可以生成BitmapOr的路径,但是实际操作中如果没有显式guc参数控制,很难走到期望的BitmapOr路径)
explain verbose
select * from vw_boss_t_pay_voucher_report_shibo a
where 1=1
and province =(610000)
and year = (2020)
and createtime >= '2020-01-01'
and createtime <= '2020-09-30'
and paytime >= '2020-01-01'
and paytime <= '2020-09-30'
and cleartime >= '2020-01-01'
and cleartime <= '2020-09-30'
and exists(select 1 from fasp_t_userorg org where org.orgguid = a.departmentguid and org.userguid = '78C4B0693CE64AA381C58EB171FDCCF5' and org.province = '610000' and org.year = '2020'
UNION ALL select 1 from fasp_t_userorg org where org.orgguid = a.agencyguid and org.userguid = '78C4B0693CE64AA381C58EB171FDCCF5' and org.province = '610000' and org.year = '2020')
order by billno,agencycode,fundtypecode,createtime;
create index fasp_t_userorg_1 on fasp_t_userorg(orgguid, userguid, province, year);
2) 如果表t输出的列数较多,修改为如下语句
explain verbose
select * from vw_boss_t_pay_voucher_report_shibo a
where 1=1
and province =(610000)
and year = (2020)
and createtime >= '2020-01-01'
and createtime <= '2020-09-30'
and paytime >= '2020-01-01'
and paytime <= '2020-09-30'
and cleartime >= '2020-01-01'
and cleartime <= '2020-09-30'
and (exists(select 1 from fasp_t_userorg org where org.orgguid = a.departmentguid and org.userguid = '78C4B0693CE64AA381C58EB171FDCCF5' and org.province = '610000' and org.year = '2020')
or exists(select 1 from fasp_t_userorg org where org.orgguid = a.agencyguid and org.userguid = '78C4B0693CE64AA381C58EB171FDCCF5' and org.province = '610000' and org.year = '2020'))
order by billno,agencycode,fundtypecode,createtime;
- 点赞
- 收藏
- 关注作者
评论(0)