GaussDB(DWS)性能调优:多表关联情景下复杂过滤条件导致性能瓶颈案例分析
【摘要】 本篇通过一个简单案例,展示了过滤和关联在不同先后关系下的性能,分析了两者性能差异的根因,总结了一类性能调优的本质。
1、【问题描述】
多张行存表相关联,结果集相对最大表较小,但是慢。从执行计划看是表扫描慢。执行计划如下:
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | A-width | E-width | E-costs
----+----------------------------------------------------------------------------------------------+----------+--------+--------+------------+-------------+---------+---------+-----------
1 | -> HashAggregate | 4079.479 | 14 | 47 | | 40KB | | 64 | 173342.44
2 | -> CTE Scan on p | 4079.433 | 107 | 47 | | 64KB | | 64 | 1.41
3 | -> Nested Loop (4,12) [2, CTE p] | 4079.332 | 107 | 47 | | 163KB | | 132 | 173336.96
4 | -> Nested Loop (5,11) | 4077.058 | 14 | 5 | | 32KB | | 132 | 172877.77
5 | -> Hash Join (6,7) | 4077.004 | 14 | 198 | | 32KB | | 104 | 171358.52
6 | -> Seq Scan on pg_catalog.pg_class c | 4074.122 | 2692 | 523580 | | 24667KB | | 72 | 169389.86
7 | -> Hash | 0.050 | 14 | 100 | | 312KB | | 64 | 2.00
8 | -> CTE Scan on t1 | 0.040 | 14 | 100 | | 56KB | | 64 | 2.00
9 | -> CTE Scan on t [8, CTE t1] | 0.034 | 14 | 100 | | 64KB | | 32 | 2.50
10 | -> Result [9, CTE t] | 0.011 | 14 | 100 | | 33KB | | 0 | 0.51
11 | -> Index Scan using pg_namespace_oid_index on pg_catalog.pg_namespace n | 0.033 | 14 | 1 | | 40KB | | 68 | 7.66
12 | -> Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute a | 1.167 | 107 | 100 | | 163KB | | 4 | 88.39
2、【原始SQL】
explain performance
with
t as(
select unnest(array['bi_dashboard.dm_mss_cn_app_rep_ir_st_dw_v1_f',
'pg_toast.pg_toast_2964_index',
'pg_toast.pg_toast_2964',
'pg_toast.pg_toast_2396',
'pg_catalog.pgxc_node_oid_index',
'pg_catalog.pg_enum_typid_label_index',
'pg_catalog.pg_largeobject_metadata_oid_index',
'pg_catalog.pg_collation_name_enc_nsp_index',
'cbg_retail.rpt_tml_retail_complc_week_f',
'pg_catalog.pg_enum_oid_index',
'pg_catalog.pg_enum_typid_sortorder_index',
'pg_catalog.pg_collation_oid_index',
'dwrisc.dwd_plan_item_grossdemd_src_f_ext',
'pg_toast.pg_toast_2396_index']) as complemented -- 这些viewname即使用脚本获取的视图名称
),
t1 as (
select
split_part(complemented, '.', 1) as schemaname,
split_part(complemented, '.', 2) as viewname
from t
),
p as(
select obj_description(a.attrelid) as comment, (n.nspname||'.'||c.relname) as fullname
from
t1
inner join pg_class c
on t1.viewname = c.relname
inner join pg_namespace n
on t1.schemaname = n.nspname
and c.relnamespace = n.oid
inner join pg_attribute a
on a.attrelid = c.oid
where a.attnum > 0 and comment is not null
)
SELECT DISTINCT ('COMMENT ON VIEW '||fullname||' IS '''||comment||''';') FROM p;
3、【性能分析】
从执行计划中可以看出,SQL的瓶颈在于第六层算子Seq Scan慢,而该算子慢的原因在于对整张表做了复杂过滤:
从计划中不难发现,在数据库中,SQL的执行顺序是先过滤再关联。obj_description(c.oid) IS NOT NULL这个过滤条件较复杂,同时注意到关联后的结果集非常小,如果先通过关联减少数据量,再来做这个过滤,是否可以快一些?
4、【改写SQL】
explain performance
with
t as(
select unnest(array['bi_dashboard.dm_mss_cn_app_rep_ir_st_dw_v1_f',
'pg_toast.pg_toast_2964_index',
'pg_toast.pg_toast_2964',
'pg_toast.pg_toast_2396',
'pg_catalog.pgxc_node_oid_index',
'pg_catalog.pg_enum_typid_label_index',
'pg_catalog.pg_largeobject_metadata_oid_index',
'pg_catalog.pg_collation_name_enc_nsp_index',
'cbg_retail.rpt_tml_retail_complc_week_f',
'pg_catalog.pg_enum_oid_index',
'pg_catalog.pg_enum_typid_sortorder_index',
'pg_catalog.pg_collation_oid_index',
'dwrisc.dwd_plan_item_grossdemd_src_f_ext',
'pg_toast.pg_toast_2396_index']) as complemented
),
t1 as (
select
split_part(complemented, '.', 1) as schemaname,
split_part(complemented, '.', 2) as viewname
from t
),
p as(
select obj_description(a.attrelid) as comment, (n.nspname||'.'||c.relname) as fullname
from
t1
inner join pg_class c
on t1.viewname = c.relname
inner join pg_namespace n
on t1.schemaname = n.nspname
and c.relnamespace = n.oid
inner join pg_attribute a
on a.attrelid = c.oid
where a.attnum > 0
)
SELECT DISTINCT ('COMMENT ON VIEW '||fullname||' IS '''||comment||''';') FROM p where comment is not null;
改写后的SQL把comment is not null的过滤条件移到了外层查询。对应执行计划如下:
id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | A-width | E-width | E-costs
----+----------------------------------------------------------------------------------------------+---------+--------+--------+------------+-------------+---------+---------+----------
1 | -> HashAggregate | 267.136 | 14 | 102 | | 40KB | | 64 | 40796.81
2 | -> CTE Scan on p | 267.054 | 107 | 102 | | 64KB | | 64 | 3.08
3 | -> Nested Loop (4,13) [2, CTE p] | 266.911 | 107 | 103 | | 195KB | | 132 | 40788.43
4 | -> Hash Join (5,11) | 265.063 | 14 | 11 | | 32KB | | 132 | 39979.20
5 | -> Hash Join (6,7) | 263.493 | 14 | 198 | | 32KB | | 104 | 39820.62
6 | -> Seq Scan on pg_catalog.pg_class c | 149.743 | 525067 | 525483 | | 32KB | | 72 | 37844.83
7 | -> Hash | 0.056 | 14 | 100 | | 312KB | | 64 | 2.00
8 | -> CTE Scan on t1 | 0.046 | 14 | 100 | | 56KB | | 64 | 2.00
9 | -> CTE Scan on t [8, CTE t1] | 0.038 | 14 | 100 | | 64KB | | 32 | 2.50
10 | -> Result [9, CTE t] | 0.014 | 14 | 100 | | 33KB | | 0 | 0.51
11 | -> Hash | 1.376 | 35 | 33 | | 320KB | | 68 | 156.33
12 | -> Seq Scan on pg_catalog.pg_namespace n | 1.351 | 35 | 33 | | 32KB | | 68 | 156.33
13 | -> Index Only Scan using pg_attribute_relid_attnum_index on pg_catalog.pg_attribute a | 0.132 | 107 | 101 | | 40KB | | 4 | 70.12
从计划中可以发现,Seq Scan算子的用时大幅减少了,而各层关联算子的执行时间并没有显著变慢。comment is not null的过滤移到了第2层算子里面,由于行数很少,耗时并不多。
5、【调优总结】
在本案例中,原SQL因为把复杂过滤条件放到where子句里,执行时在表扫描时做复杂过滤,导致性能瓶颈。把该复杂过滤条件移到外层查询中后,由于复杂过滤涉及行数较少,性能有了很大改善。
如果情况相反,存在非等值关联条件无法做hash join,而过滤条件简单呢?那就需要把过滤向子查询移动,提前过滤了。究竟要把过滤提前还是移后,取决于怎样才能快速地降低流经算子的元组个数。这同提前聚合的思想是异曲同工的。
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)