【调优实践】全表扫描性能瓶颈

举报
门前一棵葡萄树 发表于 2023/12/26 11:17:01 2023/12/26
【摘要】 一、概述GaussDB(DWS)作为非内存数据库,只有部分表数据缓存至内存中,读取这部分表的数据时不涉及磁盘IO;但是更多的表数据没有缓存至内存中,读取这表表数据时就会涉及磁盘IO。当表数据量较大(亿级)时,磁盘IO所消耗的时间可能成为查询的瓶颈,如果减少磁盘IO成为提升查询SCAN性能的关键。GaussDB(DWS)做了丰富的功能用于减少磁盘IO,包含:分区剪枝、PCK、Rough Che...

一、概述

GaussDB(DWS)作为非内存数据库,只有部分表数据缓存至内存中,读取这部分表的数据时不涉及磁盘IO;但是更多的表数据没有缓存至内存中,读取这表表数据时就会涉及磁盘IO。当表数据量较大(亿级)时,磁盘IO所消耗的时间可能成为查询的瓶颈,如果减少磁盘IO成为提升查询SCAN性能的关键。GaussDB(DWS)做了丰富的功能用于减少磁盘IO,包含:分区剪枝、PCK、Rough Check、BLOOM FILTER和索引等。其中索引在行存表中常用,列存B-Tree索引性能较差且占用大量空间,因此不推荐使用。下面针对列存表常用的几种查询SCAN性能优化方法进行简要探讨。

二、分区剪枝

2.1 分区表

使用分区剪枝,首先要使用分区表。

何为分区表?

分区表是将逻辑上的一张表根据某种策略分成几张物理块进行存储,这张逻辑上的表称之为分区表,物理块称之为分区。将一个大表按照分区策略切分为若干分区,在查询时只需要读取指定分区的数据,而不是全表扫描,以此提高查询性能。

常见的分区策略包括:范围分区(Range Partitioning)、哈希分区(Hash Partitioning)、列表分区(List Partitioning)和数值分区(Value Partitioning)。

目前GaussDB(DWS)行存表、列存表仅支持范围分区和列表分区。

2.1.1 范围分区

范围分区示例:

CREATE TABLE customer_address
(
    ca_address_sk       INTEGER                  NOT NULL   ,
    ca_address_id       CHARACTER(16)            NOT NULL   ,
    ca_street_number    CHARACTER(10)                       ,
    ca_street_name      CHARACTER varying(60)               ,
    ca_street_type      CHARACTER(15)                       ,
    ca_suite_number     CHARACTER(10)                    
)
DISTRIBUTE BY HASH (ca_address_sk)
PARTITION BY RANGE(ca_address_sk)
(
        PARTITION P1 VALUES LESS THAN(2450815),
        PARTITION P2 VALUES LESS THAN(2451179),
        PARTITION P3 VALUES LESS THAN(2451544),
        PARTITION P4 VALUES LESS THAN(MAXVALUE)
);

2.1.2 列表分区

列表分区示例:

CREATE TABLE data_list
(
    id int,
    time int, 
    sarlay decimal(12,2)
)
PARTITION BY LIST (time)
(
        PARTITION P1 VALUES (202209),
        PARTITION P2 VALUES (202210,202208),
        PARTITION P3 VALUES (202211),
        PARTITION P4 VALUES (202212),
        PARTITION P5 VALUES (202301)
);

2.2 分区剪枝调优实践

2.2.1 问题描述

HANA迁移GaussDB(DWS)某项目,经常遇到创建了分区表,但是使用分区列做过滤时未走分区剪枝的问题。常见问题如下所示:

 id |                                                                                    operation                                                                                    |        A-time        | A-rows  |  E-rows  | E-distinct |  Peak Memory   |   E-memory   |   A-width   | E-width |  E-costs  
----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+----------+------------+----------------+--------------+-------------+---------+-----------
  1 | ->  Row Adapter                                                                                                                                                                 | 2663.144             |   10000 |     7156 |            | 765KB          |              |             |    1927 | 906162.01 
  2 |    ->  Vector Limit                                                                                                                                                             | 2655.856             |   10000 |     7156 |            | 18KB           |              |             |    1927 | 906162.01 
  3 |       ->  Vector Streaming (type: GATHER)                                                                                                                                       | 2655.831             |   10000 |     7156 |            | 48227KB        |              |             |    1927 | 906162.01 
  4 |          ->  Vector Limit                                                                                                                                                       | [2481.665, 2481.665] |   10000 |     7156 |            | [26KB, 26KB]   | 1MB          |             |    1927 | 899428.80 
  5 |             ->  Vector Sort                                                                                                                                                     | [2481.620, 2481.620] |   10000 |     7156 |            | [83MB, 83MB]   | 65MB(6587MB) | [2005,2005] |    1927 | 899428.80 
  6 |                ->  Vector Subquery Scan on tt                                                                                                                                   | [2240.548, 2240.548] |   23820 |   286240 |            | [1MB, 1MB]     | 1MB          |             |    1927 | 898952.75 
  7 |                   ->  Vector WindowAgg                                                                                                                                          | [2239.214, 2239.214] |   23820 |   286240 |            | [31MB, 31MB]   | 16MB         |             |    1919 | 896054.57 
  8 |                      ->  Vector Streaming(type: BROADCAST)                                                                                                                      | [1851.486, 1851.486] |   23820 |   286240 |            | [14MB, 14MB]   | 2MB          |             |    1919 | 895965.12 
  9 |                         ->  Vector Subquery Scan on t                                                                                                                           | [1820.194, 1823.894] |   23820 |   286240 |            | [1MB, 1MB]     | 1MB          |             |    1919 | 627754.89 
 10 |                            ->  Vector Hash Aggregate                                                                                                                            | [1820.178, 1823.881] |   23820 |   286240 |            | [6MB, 6MB]     | 16MB         | [2583,2606] |    1536 | 627683.33 
 11 |                               ->  Vector Streaming(type: REDISTRIBUTE)                                                                                                          | [1814.722, 1816.813] |   23820 |   286240 |            | [2MB, 2MB]     | 2MB          |             |    1536 | 624892.49 
 12 |                                  ->  Vector WindowAgg                                                                                                                           | [369.498, 369.498]   |   23820 |   286240 |            | [11MB, 11MB]   | 16MB         |             |    1536 | 404487.69 
 13 |                                     ->  Vector Streaming(type: BROADCAST)                                                                                                       | [238.723, 238.723]   |   23820 |   286240 |            | [183MB, 183MB] | 2MB          |             |    1536 | 402562.55 
 14 |                                        ->  Vector Sort                                                                                                                          | [701.363, 822.033]   |   23820 |   286240 |            | [4MB, 6MB]     | 68MB(6590MB) | [2015,2041] |    1536 | 187882.55 
 15 |                                           ->  Vector Hash Left Join (16, 52)                                                                                                    | [694.329, 814.301]   |   23820 |   286234 |            | [5MB, 5MB]     | 16MB         |             |    1536 | 187406.50 
 16 |                                              ->  Vector Hash Left Join (17, 51)                                                                                                 | [690.055, 809.445]   |   23820 |   144812 | 3          | [5MB, 5MB]     | 16MB         |             |    1516 | 185193.94 
 17 |                                                 ->  Vector Hash Left Join (18, 50)                                                                                              | [686.082, 805.595]   |   23820 |    78451 | 3          | [5MB, 5MB]     | 16MB         |             |    1494 | 183011.09 
 18 |                                                    ->  Vector Hash Left Join (19, 49)                                                                                           | [682.088, 801.702]   |   23820 |    72066 | 3          | [4MB, 4MB]     | 16MB         |             |    1472 | 180839.08 
 19 |                                                       ->  Vector Hash Left Join (20, 48)                                                                                        | [680.021, 800.112]   |   23820 |    72066 | 2          | [4MB, 4MB]     | 16MB         |             |    1465 | 179955.54 
 20 |                                                          ->  Vector Sonic Hash Join (21,47)                                                                                     | [675.815, 796.889]   |   23820 |    72066 | 1          | [2MB, 2MB]     | 16MB         |             |    1438 | 178589.82 
 21 |                                                             ->  Vector Hash Right Join (22, 23)                                                                                 | [674.888, 795.947]   |   23820 |    72066 | 1          | [5MB, 5MB]     | 16MB         |             |    1404 | 178521.47 
 22 |                                                                ->  CStore Scan on qualityanalysis.rpt_qm_dim_lookup_d lookup                                                    | [4.977, 6.485]       | 4654360 | 14529480 | 363237     | [1MB, 1MB]     | 1MB          |             |      20 | 10647.98  
 23 |                                                                ->  Vector Hash Right Join (24, 25)                                                                              | [616.514, 737.297]   |   23820 |    72066 | 17         | [5MB, 5MB]     | 16MB         | [2371,2397] |    1400 | 166874.06 
 24 |                                                                   ->  CStore Scan on qualityanalysis.rpt_tml_qm_sc_d sc                                                         | [4.285, 5.746]       | 2591960 |  2591960 | 64799      | [1MB, 1MB]     | 1MB          |             |      56 | 6262.79   
 25 |                                                                   ->  Vector Sonic Hash Join (26,46)                                                                            | [606.500, 726.023]   |   23820 |    72066 | 21         | [2MB, 2MB]     | 16MB         | [2294,2320] |    1360 | 160395.56 
 26 |                                                                      ->  Vector Hash Left Join (27, 45)                                                                         | [604.389, 723.756]   |   24794 |    74993 | 530        | [5MB, 5MB]     | 16MB         |             |    1310 | 159018.67 
 27 |                                                                         ->  Vector Sonic Hash Join (28,44)                                                                      | [600.642, 720.354]   |   24794 |    74993 | 21         | [4MB, 4MB]     | 16MB         |             |    1263 | 157506.18 
 28 |                                                                            ->  Vector Sonic Hash Join (29,43)                                                                   | [597.908, 718.143]   |   24794 |   182226 | 1          | [3MB, 3MB]     | 16MB         |             |    1251 | 157013.83 
 29 |                                                                               ->  Vector Sonic Hash Join (30,42)                                                                | [596.279, 716.483]   |   24794 |   182226 | 1          | [3MB, 3MB]     | 16MB         |             |    1185 | 154616.28 
 30 |                                                                                  ->  Vector Sonic Hash Join (31,41)                                                             | [594.766, 714.937]   |   24794 |   182226 | 1          | [3MB, 3MB]     | 16MB         |             |    1119 | 152218.72 
 31 |                                                                                     ->  Vector Sonic Hash Join (32,40)                                                          | [593.260, 713.375]   |   24794 |   182226 | 2          | [2MB, 2MB]     | 16MB         |             |    1053 | 149821.17 
 32 |                                                                                        ->  Vector Sonic Hash Join (33,39)                                                       | [558.139, 670.004]   |   24794 |   182226 | 1          | [3MB, 3MB]     | 16MB         |             |    1052 | 138640.82 
 33 |                                                                                           ->  Vector Sonic Hash Join (34,38)                                                    | [556.399, 668.308]   |   24794 |   182226 | 1          | [6MB, 6MB]     | 16MB         |             |     905 | 138303.88 
 34 |                                                                                              ->  Vector Sonic Hash Join (35,37)                                                 | [542.065, 651.249]   |   24794 |   182226 | 11         | [1MB, 1MB]     | 16MB         |             |     777 | 132276.68 
 35 |                                                                                                 ->  Vector Partition Iterator                                                   | [520.204, 625.238]   |   24794 |   102190 | 11         | [41KB, 41KB]   | 1MB          |             |     777 | 120873.93 
 36 |                                                                                                    ->  Partitioned CStore Scan on qualityanalysis.rpt_tml_qm_phone_vendor_f ffr | [500.450, 604.030]   |   24794 |   102190 |            | [10MB, 10MB]   | 1MB          |             |     777 | 120873.93 
 37 |                                                                                                 ->  CStore Scan on qualityanalysis.rpt_tml_qm_prod_info_d prod_info             | [20.547, 25.433]     |  690440 |   692200 | 1079       | [1MB, 1MB]     | 1MB          |             |      16 | 9185.83   
 38 |                                                                                              ->  CStore Scan on qualityanalysis.rpt_tml_qm_productteam_dim prod_team            | [8.789, 11.134]      |  482800 |   477600 | 11940      | [4MB, 4MB]     | 1MB          |             |     144 | 5806.08   
 39 |                                                                                           ->  CStore Scan on qualityanalysis.rpt_qm_sv_country_geo_d region                     | [0.297, 0.363]       |   10720 |    10720 | 268        | [2MB, 2MB]     | 1MB          |             |     163 | 270.94    
 40 |                                                                                        ->  CStore Scan on qualityanalysis.rpt_qm_dim_lookup_d look                              | [34.358, 42.754]     |     480 |      160 | 4          | [958KB, 958KB] | 1MB          |             |      17 | 11118.33  
 41 |                                                                                     ->  CStore Scan on qualityanalysis.rpt_tml_qm_fault_type_d fault1                           | [0.294, 0.361]       |   89160 |    89160 | 2229       | [1MB, 1MB]     | 1MB          |             |      74 | 2307.05   
 42 |                                                                                  ->  CStore Scan on qualityanalysis.rpt_tml_qm_fault_type_d fault2                              | [0.306, 0.359]       |   89160 |    89160 | 2229       | [1MB, 1MB]     | 1MB          |             |      74 | 2307.05   
 43 |                                                                               ->  CStore Scan on qualityanalysis.rpt_tml_qm_fault_type_d fault3                                 | [0.343, 0.442]       |   89160 |    89160 | 2229       | [1MB, 1MB]     | 1MB          |             |      74 | 2307.05   
 44 |                                                                            ->  CStore Scan on qualityanalysis.rpt_tml_qm_mat_ven_rel_n_d rel                                    | [0.082, 0.113]       |   34240 |    34240 | 547        | [592KB, 592KB] | 1MB          |             |      12 | 395.86    
 45 |                                                                         ->  CStore Scan on qualityanalysis.rpt_qm_sv_partner_d part                                             | [0.562, 0.763]       |  396640 |   396640 | 9916       | [1MB, 1MB]     | 1MB          | [110,110]   |      55 | 1352.71   
 46 |                                                                      ->  CStore Scan on qualityanalysis.rpt_tml_qm_vendor_d vendor                                              | [1.034, 1.234]       |   29320 |    18120 | 377        | [1MB, 1MB]     | 1MB          |             |      66 | 1342.76   
 47 |                                                             ->  CStore Scan on qualityanalysis.rpt_tml_qm_material_type_d type                                                  | [0.083, 0.131]       |    1720 |     1720 | 43         | [744KB, 744KB] | 1MB          |             |      42 | 43.04     
 48 |                                                          ->  CStore Scan on qualityanalysis.rpt_tml_qm_vendor_d ven_2nd                                                         | [0.569, 0.698]       |   46360 |    46360 | 964        | [1MB, 1MB]     | 1MB          | [104,104]   |      43 | 1298.65   
 49 |                                                       ->  CStore Scan on qualityanalysis.rpt_qm_sv_operator_d op                                                                | [0.110, 0.147]       |   49680 |    49680 | 1242       | [674KB, 674KB] | 1MB          | [50,50]     |      23 | 843.24    
 50 |                                                    ->  CStore Scan on qualityanalysis.rpt_qm_fault_type_rel_d fault_rel1                                                        | [0.829, 1.085]       |  271640 |   271640 | 2600       | [1MB, 1MB]     | 1MB          | [99,99]     |      32 | 1981.35   
 51 |                                                 ->  CStore Scan on qualityanalysis.rpt_qm_fault_type_rel_d fault_rel2                                                           | [0.830, 1.135]       |  271640 |   271640 | 2600       | [1MB, 1MB]     | 1MB          | [99,99]     |      32 | 1981.35   
 52 |                                              ->  CStore Scan on qualityanalysis.rpt_qm_fault_type_rel_d fault_rel3                                                              | [0.927, 1.266]       |  271640 |   271640 | 2600       | [1MB, 1MB]     | 1MB          | [99,99]     |      32 | 1981.35   

                                                             SQL Diagnostic Information                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------
Partitioned table unprunable Qual
"	table qualityanalysis.rpt_tml_qm_phone_vendor_f ffr:"
"	left side of expression "(to_char(to_date((take_date)::text), 'YYYY-MM-DD'::text) >= '2023-05-30'::text)" is an expression or occurs type conversion"
"	left side of expression "(to_char(to_date((take_date)::text), 'YYYY-MM-DD'::text) <= '2023-06-11'::text)" is an expression or occurs type conversion"

通过自诊断信息可知,ffr表take_date字段类型转换导致未走分区剪枝。take_date字段类型为varchar(n),to_date将take_date转为timestamp类型,转为timestamp类型后to_char后又将take_date按照‘YYYY-MM-DD’格式转为text类型。由此可见to_date和to_char两个函数的类型转换是多余的。因此直接将这两个函数移除,改成date_date >= '2023-05-30' AND take_date <= '2023-06-11'即可。

三、PCK

局部聚簇(Partial Cluster Key)是列存下的一种技术。这种技术可以通过min/max稀疏索引较快的实现基表扫描的filter过滤。如果分区剪枝不足以解决全表扫描问题,可以考虑使用PCK,提高rough check和bloom filter的CU过滤效率,以此减少磁盘IO。

1. 增加PCK

ALTER TABLE tablename ADD PARTIAL CLUSTER KEY (SPDT_PROD_KEY);

2. 删除PCK

-- 查找PCK名称
SELECT r.conname, pg_catalog.pg_get_constraintdef(r.oid, true) FROM pg_class c JOIN pg_catalog.pg_constraint r ON c.oid = r.conrelid 
WHERE r.contype = 's' AND c.relname='XXX';
-- 删除PCK
ALTER TABLE tablename DROP CONSTRAINT pckname;

四、BLOOM FILTER

使用内表对外表数据进行筛选,减少外表SCAN数据量。提升SCAN性能的同时,还能减少JOIN计算数据量,从而提升JOIN性能。在外表数据量较大,BLOOM FILTER可以过滤掉大量CU情况下,对查询性能提升效果明显。

示例:A表和B表关联:A.x=B.y,A表仅有1000条数据,B表有十亿数据,两表关联后仅有2000条数据。此时就可以考虑使用BLOOM FILTER,使用A.x的范围对B表进行过滤,BLOOM FILTER过滤效果类似于给B表增加以下过滤条件:

B.y < (select max(x) from A)  AND B.y > (select min(x) from A)。

常见不能触发BLOOM FILTER的场景有:

1. 字段类型为字符串,字段排序规则未显示指定使用“C”排序;

优化策略:字段显示指定“C”排序,示例:

ALTER TABLE  DMISC.DM_INV_TRANSACTION_SUM_F ALTER COLUMN subinventory_code TYPE char varying COLLATE "C"; 

2. JOIN算子与外表跨线程;

优化策略:

a) 修改分布列为JOIN列;

b) hint指定内表广播,示例:/*+ broadcast(a) */

3. 外表过滤率低

内表对外表的过滤,是使用Rough Check进行的范围筛选;如果外表JOIN列分布离散,很可能导致过滤效率差,甚至过滤不了CU。

优化策略:对外表JOIN列增加PCK/分区,提升其过滤效率。

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。