GaussDB(DWS)物化视图查询重写解析

举报
萨菲罗斯男人中的男人 发表于 2024/12/21 18:39:52 2024/12/21
【摘要】 本文主要介绍了DWS中物化视图查询重写的两种方式,并详细介绍了结构匹配重写的四种方式,分别以例子的方式展示了其技术实现的外在表现。

前言

物化视图(materialized view)是一张表,存储了物化视图定义中查询语句的结果。

DWS中物化视图的使用方式分为两种:

  • 直接查询:将物化视图定义成自动刷新或者定时刷新,使物化视图的数据保持较新,直接查询物化视图可以提升查询性能。

  • 查询重写:执行sql查询时,自动将查询改写为对物化视图的查询,而不需要手动修改查询语句。

本文主要介绍DWS中物化视图查询重写这一部分。

基本原理

DWS中,物化视图查询重写支持基于文本的查询重写和基于结构的查询重写两种重写方式。

  • 基于文本的查询重写:支持复杂定义的物化视图。当查询和物化视图定义一致时(不考虑排序),可以重写成物化视图。需要注意,为了保持结果集的正确性,查询和物化视图定义中的表有别名时,别名需要一致。

  • 基于结构的查询重写:支持SPJG(Select-Projection-Join-Groupby)的物化视图查询重写。不要求数据一致,支持在物化视图的数据基础上再进行加工得到查询所需的数据,或者是查询中的子查询所需的数据。

查询重写的主要流程:

  1. 查询预处理:收集查询中涉及了哪些基表

  2. 搜集物化视图:根据系统表中的依赖关系,收集这些基表涉及哪些物化视图可以被用来重写

  3. 物化视图剪枝:去掉数据已经过时的物化视图,并根据物化视图定义进行排序,当候选的物化视图个数较多时,去掉优先级较低的物化视图。

  4. 文本匹配重写:将候选的物化视图与查询进行文本匹配,如果有物化视图匹配成功,则直接进行重写,不需要进行结构匹配。

  5. 结构匹配重写:将候选的物化视图与查询进行结构匹配,如果匹配成功,直接进行重写,后续优先级较低的物化视图不再进行结构匹配。

使用场景

文本匹配重写

适用于查询语句结构复杂,例如有cte、子查询、子链接、window函数等复杂子句的查询。

以tpcds q4为例,建立定义为q4的支持查询重写的物化视图。

create materialized view mv_tpcds04
ENABLE QUERY REWRITE
AS
with year_total as (
    select c_customer_id customer_id
         ,c_first_name customer_first_name
         ,c_last_name customer_last_name
         ,c_preferred_cust_flag customer_preferred_cust_flag
         ,c_birth_country customer_birth_country
         ,c_login customer_login
         ,c_email_address customer_email_address
         ,d_year dyear
         ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total
         ,'s' sale_type
    from customer
       ,store_sales
       ,date_dim
    where c_customer_sk = ss_customer_sk
      and ss_sold_date_sk = d_date_sk
    group by c_customer_id
           ,c_first_name
           ,c_last_name
           ,c_preferred_cust_flag
           ,c_birth_country
           ,c_login
           ,c_email_address
           ,d_year
    union all
    select c_customer_id customer_id
         ,c_first_name customer_first_name
         ,c_last_name customer_last_name
         ,c_preferred_cust_flag customer_preferred_cust_flag
         ,c_birth_country customer_birth_country
         ,c_login customer_login
         ,c_email_address customer_email_address
         ,d_year dyear
         ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total
         ,'c' sale_type
    from customer
       ,catalog_sales
       ,date_dim
    where c_customer_sk = cs_bill_customer_sk
      and cs_sold_date_sk = d_date_sk
    group by c_customer_id
           ,c_first_name
           ,c_last_name
           ,c_preferred_cust_flag
           ,c_birth_country
           ,c_login
           ,c_email_address
           ,d_year
    union all
    select c_customer_id customer_id
         ,c_first_name customer_first_name
         ,c_last_name customer_last_name
         ,c_preferred_cust_flag customer_preferred_cust_flag
         ,c_birth_country customer_birth_country
         ,c_login customer_login
         ,c_email_address customer_email_address
         ,d_year dyear
         ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total
         ,'w' sale_type
    from customer
       ,web_sales
       ,date_dim
    where c_customer_sk = ws_bill_customer_sk
      and ws_sold_date_sk = d_date_sk
    group by c_customer_id
           ,c_first_name
           ,c_last_name
           ,c_preferred_cust_flag
           ,c_birth_country
           ,c_login
           ,c_email_address
           ,d_year
)
select
    t_s_secyear.customer_id
     ,t_s_secyear.customer_first_name
     ,t_s_secyear.customer_last_name
     ,t_s_secyear.customer_email_address
from year_total t_s_firstyear
   ,year_total t_s_secyear
   ,year_total t_c_firstyear
   ,year_total t_c_secyear
   ,year_total t_w_firstyear
   ,year_total t_w_secyear
where t_s_secyear.customer_id = t_s_firstyear.customer_id
  and t_s_firstyear.customer_id = t_c_secyear.customer_id
  and t_s_firstyear.customer_id = t_c_firstyear.customer_id
  and t_s_firstyear.customer_id = t_w_firstyear.customer_id
  and t_s_firstyear.customer_id = t_w_secyear.customer_id
  and t_s_firstyear.sale_type = 's'
  and t_c_firstyear.sale_type = 'c'
  and t_w_firstyear.sale_type = 'w'
  and t_s_secyear.sale_type = 's'
  and t_c_secyear.sale_type = 'c'
  and t_w_secyear.sale_type = 'w'
  and t_s_firstyear.dyear =  2001
  and t_s_secyear.dyear = 2001+1
  and t_c_firstyear.dyear =  2001
  and t_c_secyear.dyear =  2001+1
  and t_w_firstyear.dyear = 2001
  and t_w_secyear.dyear = 2001+1
  and t_s_firstyear.year_total > 0
  and t_c_firstyear.year_total > 0
  and t_w_firstyear.year_total > 0
  and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
    > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end
  and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
    > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end
order by t_s_secyear.customer_id
       ,t_s_secyear.customer_first_name
       ,t_s_secyear.customer_last_name
       ,t_s_secyear.customer_email_address
limit 100;

建立物化视图后,打印q4语句的原始查询计划。由于扫描物化视图后返回的数据并不是有序的,所以还需要按照查询中的排序键再排序一次。

EXPLAIN(COSTS OFF)
with year_total as (
    select c_customer_id customer_id
         ,c_first_name customer_first_name
         ,c_last_name customer_last_name
         ,c_preferred_cust_flag customer_preferred_cust_flag
         ,c_birth_country customer_birth_country
         ,c_login customer_login
         ,c_email_address customer_email_address
         ,d_year dyear
         ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total
         ,'s' sale_type
    from customer
       ,store_sales
       ,date_dim
    where c_customer_sk = ss_customer_sk
      and ss_sold_date_sk = d_date_sk
    group by c_customer_id
           ,c_first_name
           ,c_last_name
           ,c_preferred_cust_flag
           ,c_birth_country
           ,c_login
           ,c_email_address
           ,d_year
    union all
    select c_customer_id customer_id
         ,c_first_name customer_first_name
         ,c_last_name customer_last_name
         ,c_preferred_cust_flag customer_preferred_cust_flag
         ,c_birth_country customer_birth_country
         ,c_login customer_login
         ,c_email_address customer_email_address
         ,d_year dyear
         ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total
         ,'c' sale_type
    from customer
       ,catalog_sales
       ,date_dim
    where c_customer_sk = cs_bill_customer_sk
      and cs_sold_date_sk = d_date_sk
    group by c_customer_id
           ,c_first_name
           ,c_last_name
           ,c_preferred_cust_flag
           ,c_birth_country
           ,c_login
           ,c_email_address
           ,d_year
    union all
    select c_customer_id customer_id
         ,c_first_name customer_first_name
         ,c_last_name customer_last_name
         ,c_preferred_cust_flag customer_preferred_cust_flag
         ,c_birth_country customer_birth_country
         ,c_login customer_login
         ,c_email_address customer_email_address
         ,d_year dyear
         ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total
         ,'w' sale_type
    from customer
       ,web_sales
       ,date_dim
    where c_customer_sk = ws_bill_customer_sk
      and ws_sold_date_sk = d_date_sk
    group by c_customer_id
           ,c_first_name
           ,c_last_name
           ,c_preferred_cust_flag
           ,c_birth_country
           ,c_login
           ,c_email_address
           ,d_year
)
select
    t_s_secyear.customer_id
     ,t_s_secyear.customer_first_name
     ,t_s_secyear.customer_last_name
     ,t_s_secyear.customer_email_address
from year_total t_s_firstyear
   ,year_total t_s_secyear
   ,year_total t_c_firstyear
   ,year_total t_c_secyear
   ,year_total t_w_firstyear
   ,year_total t_w_secyear
where t_s_secyear.customer_id = t_s_firstyear.customer_id
  and t_s_firstyear.customer_id = t_c_secyear.customer_id
  and t_s_firstyear.customer_id = t_c_firstyear.customer_id
  and t_s_firstyear.customer_id = t_w_firstyear.customer_id
  and t_s_firstyear.customer_id = t_w_secyear.customer_id
  and t_s_firstyear.sale_type = 's'
  and t_c_firstyear.sale_type = 'c'
  and t_w_firstyear.sale_type = 'w'
  and t_s_secyear.sale_type = 's'
  and t_c_secyear.sale_type = 'c'
  and t_w_secyear.sale_type = 'w'
  and t_s_firstyear.dyear =  2001
  and t_s_secyear.dyear = 2001+1
  and t_c_firstyear.dyear =  2001
  and t_c_secyear.dyear =  2001+1
  and t_w_firstyear.dyear = 2001
  and t_w_secyear.dyear = 2001+1
  and t_s_firstyear.year_total > 0
  and t_c_firstyear.year_total > 0
  and t_w_firstyear.year_total > 0
  and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
    > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end
  and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
    > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end
order by t_s_secyear.customer_id
       ,t_s_secyear.customer_first_name
       ,t_s_secyear.customer_last_name
       ,t_s_secyear.customer_email_address
limit 100;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)
   Merge Sort Key: customer_id, customer_first_name, customer_last_name, customer_email_address
   Node/s: All datanodes (group1, bucket:32)
   ->  Sort
         Sort Key: customer_id, customer_first_name, customer_last_name, customer_email_address
         ->  Seq Scan on mv_tpcds04
(6 rows)

结构匹配重写

条件重写

条件重写,指的是数据经过物化视图中的条件过滤后,能够包括查询的数据。这就要求物化视图定义中的过滤条件要比查询中的更宽松,范围更广,主要表现为两个场景:

建表语句:

CREATE TABLE lineitem (
    l_orderkey integer NOT NULL,
    l_partkey integer NOT NULL,
    l_suppkey integer NOT NULL,
    l_linenumber integer NOT NULL,
    l_quantity numeric(15,2) NOT NULL,
    l_extendedprice numeric(15,2) NOT NULL,
    l_discount numeric(15,2) NOT NULL,
    l_tax numeric(15,2) NOT NULL,
    l_returnflag character(1) NOT NULL,
    l_linestatus character(1) NOT NULL,
    l_shipdate date NOT NULL,
    l_commitdate date NOT NULL,
    l_receiptdate date NOT NULL,
    l_shipinstruct character(25) NOT NULL,
    l_shipmode character(10) NOT NULL,
    l_comment character varying(44) NOT NULL
    , primary key (l_orderkey, l_linenumber)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (l_orderkey);
  1. 查询中有过滤条件,物化视图中没有对应的过滤条件:

    建立物化视图:

    CREATE MATERIALIZED VIEW mv_lineitem_1 ENABLE QUERY REWRITE AS
    SELECT
    	*
    FROM
    	lineitem
    WHERE
    	l_shipdate >= DATE '1995-01-01'
        AND l_quantity < 24;

    进行查询:

    EXPLAIN(costs off, nodes off)
    SELECT
    	*
    FROM
    	lineitem
    WHERE
    	l_shipdate >= DATE '1995-01-01'
        AND l_quantity < 24
        AND l_discount <= 0.06 + 0.01;
                 QUERY PLAN              
    -------------------------------------
     Streaming (type: GATHER)
       ->  Seq Scan on mv_lineitem_1
             Filter: (l_discount <= .07)
    (3 rows)

    查看查询计划发现,l_discount <= 0.06 + 0.01这个物化视图中没有的条件,作为补偿条件,变成了扫描物化视图的条件。

  2. 查询和物化视图中都有对某一列的过滤条件,但是物化视图的过滤条件数学上表示的范围包括了查询:

    建立物化视图:

    CREATE MATERIALIZED VIEW mv_lineitem_2 ENABLE QUERY REWRITE AS
    SELECT
    	*
    FROM
    	lineitem
    WHERE
    	(l_shipdate >= DATE '1995-01-01' AND l_shipdate <= DATE '1998-01-01')
        OR
        (l_shipdate >= DATE '1997-01-01' AND l_shipdate <= DATE '2005-01-01');
    

      查询语句与计划:

EXPLAIN(costs off, nodes off)
SELECT
	*
FROM
	lineitem
WHERE
    l_shipdate >= DATE '1996-01-01' 
    AND l_shipdate <= DATE '2000-01-01';
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)
   ->  Seq Scan on mv_lineitem_2
         Filter: ((l_shipdate >= '1996-01-01 00:00:00'::timestamp(0) without time zone) AND (l_shipdate <= '2000-01-01 00:00:00'::timestamp(0) without time zone))
(3 rows)

由于物化视图的过滤条件实际表示范围为[1995-01-01, 2005-01-01],查询的过滤条件表示范围为[1996-01-01, 2000-01-01],物化视图的范围包括了查询的范围,因此可以重写。

Join重写

DWS的物化视图Join重写,通过等价类匹配等值连接约束,通过精确匹配非等值连接约束,匹配多表关联的连接关系。支持的join类型报错inner join、left join、right join、full join、semi join、anti join。

  • 当查询和物化视图中均为inner join时,实际上所有的关联条件均可以视为过滤条件,join重写变成了过滤条件的匹配。

create table lineitem (
    l_orderkey integer NOT NULL,
    l_partkey integer NOT NULL,
    l_suppkey integer NOT NULL,
    l_linenumber integer NOT NULL,
    l_quantity numeric(15,2) NOT NULL,
    l_extendedprice numeric(15,2) NOT NULL,
    l_discount numeric(15,2) NOT NULL,
    l_tax numeric(15,2) NOT NULL,
    l_returnflag character(1) NOT NULL,
    l_linestatus character(1) NOT NULL,
    l_shipdate date NOT NULL,
    l_commitdate date NOT NULL,
    l_receiptdate date NOT NULL,
    l_shipinstruct character(25) NOT NULL,
    l_shipmode character(10) NOT NULL,
    l_comment character varying(44) NOT NULL
    , primary key (l_orderkey, l_linenumber)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (l_orderkey);

create table orders (
    o_orderkey integer NOT NULL,
    o_custkey integer NOT NULL,
    o_orderstatus character(1) NOT NULL,
    o_totalprice numeric(15,2) NOT NULL,
    o_orderdate date NOT NULL,
    o_orderpriority character(15) NOT NULL,
    o_clerk character(15) NOT NULL,
    o_shippriority integer NOT NULL,
    o_comment character varying(79) NOT NULL
    , primary key (O_ORDERKEY)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (o_orderkey);

create table customer (
    c_custkey integer NOT NULL,
    c_name character varying(25) NOT NULL,
    c_address character varying(40) NOT NULL,
    c_nationkey integer NOT NULL,
    c_phone character(15) NOT NULL,
    c_acctbal numeric(15,2) NOT NULL,
    c_mktsegment character(10) NOT NULL,
    c_comment character varying(117) NOT NULL
    , primary key (C_CUSTKEY)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (c_custkey);

CREATE MATERIALIZED VIEW mv_tpch_q3 ENABLE QUERY REWRITE AS
SELECT
	l_orderkey,
	sum(l_extendedprice * (1 - l_discount)) AS revenue,
	o_orderdate,
	o_shippriority
FROM
	customer,
	orders,
	lineitem
WHERE
	c_mktsegment = 'BUILDING'
	and c_custkey = o_custkey
	and l_orderkey = o_orderkey
	and l_shipdate > '1995-03-15'::date;

查询tpch q3:

EXPLAIN(costs OFF, nodes off)
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < '1995-03-15'::date
and l_shipdate > '1995-03-15'::date
group by l_orderkey, o_orderdate, o_shippriority
order by revenue desc, o_orderdate
limit 10;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Limit
   ->  Streaming (type: GATHER)
         Merge Sort Key: revenue DESC, o_orderdate
         ->  Limit
               ->  Sort
                     Sort Key: revenue DESC, o_orderdate
                     ->  Seq Scan on mv_tpch_q3
                           Filter: (o_orderdate <= '1995-03-14 23:59:59'::timestamp without time zone)
(8 rows)
  • 当查询和物化视图中对应的关联是外连接时,需要连接类型相同,并且连接顺序一致,inner一侧需要一致。

    建物化视图:

    CREATE MATERIALIZED VIEW mv_join_outer ENABLE QUERY REWRITE AS
    SELECT
    	o_orderdate,
    	o_shippriority
    FROM
    	orders LEFT JOIN customer
    ON
        c_custkey = o_custkey;

    查询语句与计划:

    EXPLAIN(costs off, nodes off)
    SELECT
    	o_orderdate,
    	o_shippriority
    FROM
    	orders LEFT JOIN customer
    ON
        c_custkey = o_custkey
    WHERE
        o_orderdate < '1995-03-15'::date;
                                         QUERY PLAN                                      
    -------------------------------------------------------------------------------------
     Streaming (type: GATHER)
       ->  Seq Scan on mv_join_outer
             Filter: (o_orderdate <= '1995-03-14 23:59:59'::timestamp without time zone)
    (3 rows)
  • DWS还支持关联类型不同的派生join,当查询和物化视图对应的关联类型满足派生重写的join类型矩阵,并且关联条件相同时,可以进行重写。

    关联矩阵为:

    查询Join类型

    物化视图 Join类型

    补偿谓词

    inner join

    left join

    右表列 is not null

    anti join

    left join

    右表列 is null

    inner join

    right join

    左表列 is not null

    right anti join

    right join

    左表列 is null

    semi join

    inner join

    主键/unique 直接转换

    right semi join

    inner join

    主键/unique 直接转换

    left join

    full join

    左表列 is not null

    right join

    full join

    右表列 is not null

    inner join

    full join

    两侧表is not null

    建立物化视图:

    CREATE MATERIALIZED VIEW mv_join_derive ENABLE QUERY REWRITE AS
    SELECT
    	o_orderdate,
    	o_shippriority,
        c_custkey
    FROM
    	orders LEFT JOIN customer
    ON
        c_custkey = o_custkey;

    查询语句和计划:

    EXPLAIN(costs off, nodes off)
    SELECT
    	o_orderdate,
    	o_shippriority,
        c_custkey
    FROM
    	orders JOIN customer
    ON
        c_custkey = o_custkey;
                   QUERY PLAN                
    -----------------------------------------
     Streaming (type: GATHER)
       ->  Seq Scan on mv_join_derive
             Filter: (c_custkey IS NOT NULL)
    (3 rows)

    物化视图中left join的右表customer的关联键添加了is not null条件。

  • DWS支持物化视图的表个数多余查询的view delta join,可以通过主外键的消除连接的方式来进行匹配。

    view delta join需要通过主外键关系去消除,因此基表上需要建外键。建表语句:

    CREATE TABLE customer (
        c_custkey integer NOT NULL,
        c_name character varying(25) NOT NULL,
        c_address character varying(40) NOT NULL,
        c_nationkey integer NOT NULL,
        c_phone character(15) NOT NULL,
        c_acctbal numeric(15,2) NOT NULL,
        c_mktsegment character(10) NOT NULL,
        c_comment character varying(117) NOT NULL
        , primary key (C_CUSTKEY)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (c_custkey);
    
    CREATE TABLE lineitem (
        l_orderkey integer NOT NULL,
        l_partkey integer NOT NULL,
        l_suppkey integer NOT NULL,
        l_linenumber integer NOT NULL,
        l_quantity numeric(15,2) NOT NULL,
        l_extendedprice numeric(15,2) NOT NULL,
        l_discount numeric(15,2) NOT NULL,
        l_tax numeric(15,2) NOT NULL,
        l_returnflag character(1) NOT NULL,
        l_linestatus character(1) NOT NULL,
        l_shipdate date NOT NULL,
        l_commitdate date NOT NULL,
        l_receiptdate date NOT NULL,
        l_shipinstruct character(25) NOT NULL,
        l_shipmode character(10) NOT NULL,
        l_comment character varying(44) NOT NULL
        , primary key (l_orderkey, l_linenumber)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (l_orderkey);
    
    CREATE TABLE orders (
        o_orderkey integer NOT NULL,
        o_custkey integer NOT NULL,
        o_orderstatus character(1) NOT NULL,
        o_totalprice numeric(15,2) NOT NULL,
        o_orderdate date NOT NULL,
        o_orderpriority character(15) NOT NULL,
        o_clerk character(15) NOT NULL,
        o_shippriority integer NOT NULL,
        o_comment character varying(79) NOT NULL
        , primary key (O_ORDERKEY)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (o_orderkey);
    
    CREATE TABLE nation (
        n_nationkey integer NOT NULL,
        n_name character(25) NOT NULL,
        n_regionkey integer NOT NULL,
        n_comment character varying(152)
        , primary key (n_nationkey)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (n_nationkey);
    
    CREATE TABLE part (
        p_partkey integer NOT NULL,
        p_name character varying(55) NOT NULL,
        p_mfgr character(25) NOT NULL,
        p_brand character(10) NOT NULL,
        p_type character varying(25) NOT NULL,
        p_size integer NOT NULL,
        p_container character(10) NOT NULL,
        p_retailprice numeric(15,2) NOT NULL,
        p_comment character varying(23) NOT NULL
        , primary key (p_partkey)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (p_partkey);
    
    CREATE TABLE partsupp (
        ps_partkey integer NOT NULL,
        ps_suppkey integer NOT NULL,
        ps_availqty integer NOT NULL,
        ps_supplycost numeric(15,2) NOT NULL,
        ps_comment character varying(199) NOT NULL
        , primary key (ps_partkey, ps_suppkey)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (ps_partkey);
    
    CREATE TABLE region (
        r_regionkey integer NOT NULL,
        r_name character(25) NOT NULL,
        r_comment character varying(152)
        , primary key (r_regionkey)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (r_regionkey);
    
    CREATE TABLE supplier (
        s_suppkey integer NOT NULL,
        s_name character(25) NOT NULL,
        s_address character varying(40) NOT NULL,
        s_nationkey integer NOT NULL,
        s_phone character(15) NOT NULL,
        s_acctbal numeric(15,2) NOT NULL,
        s_comment character varying(101) NOT NULL
        , primary key (s_suppkey)
    )
    WITH (orientation = column)
    DISTRIBUTE BY HASH (s_suppkey);
    
    ALTER TABLE NATION   ADD FOREIGN KEY (N_REGIONKEY) references REGION(R_REGIONKEY);
    ALTER TABLE SUPPLIER ADD FOREIGN KEY (S_NATIONKEY) references NATION(N_NATIONKEY);
    ALTER TABLE CUSTOMER ADD FOREIGN KEY (C_NATIONKEY) references NATION(N_NATIONKEY);
    ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_SUPPKEY)  references SUPPLIER(S_SUPPKEY);
    ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_PARTKEY)  references PART(P_PARTKEY);
    ALTER TABLE ORDERS   ADD FOREIGN KEY (O_CUSTKEY)   references CUSTOMER(C_CUSTKEY);
    ALTER TABLE LINEITEM ADD FOREIGN KEY (L_ORDERKEY)  references ORDERS(O_ORDERKEY);
    ALTER TABLE LINEITEM ADD FOREIGN KEY (L_PARTKEY,L_SUPPKEY) references PARTSUPP(PS_PARTKEY, PS_SUPPKEY);

    建立物化视图:

    create materialized view mv_tpch_flat enable query rewrite
    with(orientation = column) distribute by hash(l_shipdate, l_orderkey, l_linenumber)
    as select
        c_address, c_acctbal,c_comment,c_mktsegment,c_name,c_nationkey,c_phone,
        l_commitdate,l_linenumber,l_extendedprice,l_orderkey,l_partkey,l_quantity,l_receiptdate,l_returnflag,l_shipdate,l_shipinstruct,l_shipmode,l_suppkey,
        o_custkey,o_orderdate,o_orderpriority,o_orderstatus,o_shippriority,o_totalprice,
        p_brand,p_container,p_name,p_size,p_type,
        s_name,s_nationkey,
        extract(year from l_shipdate) as l_shipyear,
        l_extendedprice * (1 - l_discount) as l_saleprice,
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as l_amount,
        ps_supplycost * l_quantity as l_supplycost,
        extract(year from o_orderdate) as o_orderyear,
        s_nation.n_name as n_name1,
        s_nation.n_regionkey as n_regionkey1,
        c_nation.n_name as n_name2,
        c_nation.n_regionkey as n_regionkey2,
        s_region.r_name as r_name1,
        c_region.r_name as r_name2
       from
         lineitem
         , partsupp
         , orders
         , supplier
         , part
         , customer
         , nation as s_nation
         , nation as c_nation
         , region as s_region
         , region as c_region
      where
         lineitem.l_partkey=partsupp.ps_partkey
         and lineitem.l_suppkey=partsupp.ps_suppkey
         and lineitem.l_orderkey=orders.o_orderkey
         and partsupp.ps_partkey=part.p_partkey
         and partsupp.ps_suppkey=supplier.s_suppkey
         and customer.c_custkey=orders.o_custkey
         and supplier.s_nationkey=s_nation.n_nationkey
         and customer.c_nationkey=c_nation.n_nationkey
         and s_region.r_regionkey=s_nation.n_regionkey
         and c_region.r_regionkey=c_nation.n_regionkey
    ;

查询与计划:

EXPLAIN(costs OFF, nodes off)
SELECT
	l_orderkey,
	sum(l_extendedprice * (1 - l_discount)) AS revenue,
	o_orderdate,
	o_shippriority
FROM
	customer,
	orders,
	lineitem
WHERE
	c_mktsegment = 'BUILDING'
	and c_custkey = o_custkey
	and l_orderkey = o_orderkey
	and o_orderdate < '1995-03-15'::date
	and l_shipdate > '1995-03-15'::date
group by l_orderkey, o_orderdate, o_shippriority
order by revenue desc, o_orderdate
limit 10;
                                                                                                                              QUERY PLAN                                                                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Row Adapter
   ->  Vector Limit
         ->  Vector Streaming (type: GATHER)
               Merge Sort Key: (sum(l_saleprice)) DESC, o_orderdate
               ->  Vector Limit
                     ->  Vector Sort
                           Sort Key: (sum(l_saleprice)) DESC, o_orderdate
                           ->  Vector Sonic Hash Aggregate
                                 Group By Key: l_orderkey, o_orderdate, o_shippriority
                                 ->  Vector Streaming(type: REDISTRIBUTE)
                                       ->  CStore Scan on mv_lineitem
                                             Filter: ((o_orderdate <= 'Tue Mar 14 23:59:59 1995'::timestamp without time zone) AND (l_shipdate >= 'Wed Mar 15 00:00:01 1995'::timestamp without time zone) AND (c_mktsegment = 'BUILDING'::bpchar))
                                             Pushdown Predicate Filter: ((o_orderdate <= 'Tue Mar 14 23:59:59 1995'::timestamp without time zone) AND (l_shipdate >= 'Wed Mar 15 00:00:01 1995'::timestamp without time zone) AND (c_mktsegment = 'BUILDING'::bpchar))
(13 rows)

投影重写

投影重写支持在物化视图输出列的基础上,进行复杂的表达式投影,包括查询的输出列,以及补偿添加的过滤条件和关联条件。

建立物化视图:

CREATE MATERIALIZED VIEW mv_projection ENABLE QUERY REWRITE AS
SELECT
	o_orderpriority
FROM
	orders,
	lineitem
WHERE
	o_orderkey = l_orderkey
	and l_shipmode IN('MAIL ', 'SHIP ')
	and l_commitdate < l_receiptdate
	and l_shipdate < l_commitdate
	and l_receiptdate >= DATE '1994-01-01'
	and l_receiptdate < DATE '1994-01-01' + INTERVAL '1 year';

查询和计划:

explain (costs off, nodes off, verbose on)
SELECT
	sum(CASE
		when o_orderpriority = '1-URGENT'
			or o_orderpriority = '2-HIGH'
			then 1
		else 0
	end) AS high_line_count,
	sum(CASE
		when o_orderpriority <> '1-URGENT'
			and o_orderpriority <> '2-HIGH'
			then 1
		else 0
	end) AS low_line_count
FROM
	orders,
	lineitem
WHERE
	o_orderkey = l_orderkey
	and l_shipmode IN('MAIL ', 'SHIP ')
	and l_commitdate < l_receiptdate
	and l_shipdate < l_commitdate
	and l_receiptdate >= DATE '1994-01-01'
	and l_receiptdate < DATE '1994-01-01' + INTERVAL '1 year';
                                                                                                                                      QUERY PLAN                                                                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate
   Output: pg_catalog.sum((sum(CASE WHEN ((o_orderpriority = '1-URGENT'::bpchar) OR (o_orderpriority = '2-HIGH'::bpchar)) THEN 1 ELSE 0 END))), pg_catalog.sum((sum(CASE WHEN ((o_orderpriority <> '1-URGENT'::bpchar) AND (o_orderpriority <> '2-HIGH'::bpchar)) THEN 1 ELSE 0 END)))
   ->  Streaming (type: GATHER)
         Output: (sum(CASE WHEN ((o_orderpriority = '1-URGENT'::bpchar) OR (o_orderpriority = '2-HIGH'::bpchar)) THEN 1 ELSE 0 END)), (sum(CASE WHEN ((o_orderpriority <> '1-URGENT'::bpchar) AND (o_orderpriority <> '2-HIGH'::bpchar)) THEN 1 ELSE 0 END))
         ->  Aggregate
               Output: sum(CASE WHEN ((o_orderpriority = '1-URGENT'::bpchar) OR (o_orderpriority = '2-HIGH'::bpchar)) THEN 1 ELSE 0 END), sum(CASE WHEN ((o_orderpriority <> '1-URGENT'::bpchar) AND (o_orderpriority <> '2-HIGH'::bpchar)) THEN 1 ELSE 0 END)
               ->  Seq Scan on public.mv_projection
                     Output: o_orderpriority
(8 rows)

Agg重写

DWS的agg函数重写,支持包括min/max/sum/count/avg多种agg函数。

建立物化视图:

CREATE MATERIALIZED VIEW mv_lineitem_3 ENABLE QUERY REWRITE AS
SELECT
    *
FROM
	lineitem
WHERE
	l_shipdate <= DATE '1998-12-01' - INTERVAL '3 day';

查询与计划:

EXPLAIN(costs OFF, nodes off)
SELECT
	l_returnflag,
	l_linestatus,
	sum(l_quantity) AS sum_qty,
	sum(l_extendedprice) AS sum_base_price,
	sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
	avg(l_quantity) AS avg_qty,
	avg(l_extendedprice) AS avg_price,
	avg(l_discount) AS avg_disc,
	count(*) AS count_order
FROM
	lineitem
WHERE
	l_shipdate <= DATE '1998-12-01' - INTERVAL '3 day'
GROUP BY
	l_returnflag,
	l_linestatus
ORDER BY
	sum_qty;
                       QUERY PLAN                       
--------------------------------------------------------
 Streaming (type: GATHER)
   Merge Sort Key: (sum(l_quantity))
   ->  Sort
         Sort Key: (sum(l_quantity))
         ->  HashAggregate
               Group By Key: l_returnflag, l_linestatus
               ->  Streaming(type: REDISTRIBUTE)
                     ->  Seq Scan on mv_lineitem_3
(8 rows)

除了普通的agg函数重写,DWS还支持上卷重写,在已有的聚集列上再次进行聚集,在聚集函数的基础上进行聚集函数投影。

各个agg函数上卷重写结果为:

原聚集函数

上卷重写

count

sum(count)

sum

sum(sum)

min

min(min)

max

max(max)

avg

sum/count

rb_build_agg

rb_or_agg(rb_build_agg)

建立物化视图:

create materialized view mv_lineitem_agg_1 enable query rewrite
with(orientation = column) distribute by hash(l_orderkey, l_shipdate, l_returnflag, l_linestatus)
as select
  l_orderkey,
  l_shipdate,
  l_returnflag,
  l_linestatus,
  count(*) as total_cnt,
  sum(l_quantity) as sum_qty,
  sum(l_extendedprice) as sum_base_price,
  sum(l_discount) as sum_discount,
  sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
  sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge
from
   lineitem
   group by
       l_orderkey,
       l_shipdate,
       l_returnflag,
       l_linestatus
;

查询与计划:

EXPLAIN(costs off, nodes off, verbose on)
SELECT
	l_returnflag,
	l_linestatus,
	sum(l_quantity) AS sum_qty,
	sum(l_extendedprice) AS sum_base_price,
	sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
	avg(l_quantity) AS avg_qty,
	avg(l_extendedprice) AS avg_price,
	avg(l_discount) AS avg_disc,
	count(*) AS count_order
FROM
	lineitem
WHERE
	l_shipdate <= DATE '1998-12-01' - INTERVAL '3 day'
GROUP BY
	l_returnflag,
	l_linestatus
ORDER BY
	sum_qty;
                                                                                                                                                                                                                              QUERY PLAN                                                                                   
                                                                                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------
 Row Adapter
   Output: l_returnflag, l_linestatus, (sum((sum(sum_qty)))), (sum((sum(sum_base_price)))), (sum((sum(sum_disc_price)))), (sum((sum(sum_charge)))), ((sum((sum(sum_qty))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((sum((sum(sum_base_price))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric))
, ((sum((sum(sum_discount))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((pg_catalog.sum((sum(total_cnt))))::bigint)
   ->  Vector Sort
         Output: l_returnflag, l_linestatus, (sum((sum(sum_qty)))), (sum((sum(sum_base_price)))), (sum((sum(sum_disc_price)))), (sum((sum(sum_charge)))), ((sum((sum(sum_qty))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((sum((sum(sum_base_price))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::num
eric)), ((sum((sum(sum_discount))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((pg_catalog.sum((sum(total_cnt))))::bigint)
         Sort Key: (sum((sum(mv_lineitem_agg_1.sum_qty))))
         ->  Vector Sonic Hash Aggregate
               Output: l_returnflag, l_linestatus, sum((sum(sum_qty))), sum((sum(sum_base_price))), sum((sum(sum_disc_price))), sum((sum(sum_charge))), (sum((sum(sum_qty))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric), (sum((sum(sum_base_price))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)
, (sum((sum(sum_discount))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric), (pg_catalog.sum((sum(total_cnt))))::bigint
               Group By Key: mv_lineitem_agg_1.l_returnflag, mv_lineitem_agg_1.l_linestatus
               ->  Vector Streaming (type: GATHER)
                     Output: l_returnflag, l_linestatus, (sum(sum_qty)), (sum(sum_base_price)), (sum(sum_disc_price)), (sum(sum_charge)), (sum(total_cnt)), (sum(sum_discount))
                     ->  Vector Sonic Hash Aggregate
                           Output: l_returnflag, l_linestatus, sum(sum_qty), sum(sum_base_price), sum(sum_disc_price), sum(sum_charge), sum(total_cnt), sum(sum_discount)
                           Group By Key: mv_lineitem_agg_1.l_returnflag, mv_lineitem_agg_1.l_linestatus
                           ->  CStore Scan on public.mv_lineitem_agg_1
                                 Output: l_returnflag, l_linestatus, sum_qty, sum_base_price, sum_disc_price, sum_charge, total_cnt, sum_discount
                                 Distribute Key: l_orderkey, l_shipdate, l_returnflag, l_linestatus
                                 Filter: (mv_lineitem_agg_1.l_shipdate <= '1998-11-28 00:00:00'::timestamp without time zone)
                                 Pushdown Predicate Filter: (mv_lineitem_agg_1.l_shipdate <= '1998-11-28 00:00:00'::timestamp without time zone)

从查询计划里看,查询在物化视图的基础上,进行了上卷重写,sum重写成了sum(sum),avg重写成了sum/count,count重写成了sum(count)。

总结

本文主要介绍了DWS中物化视图查询重写的两种方式,并详细介绍了结构匹配重写的四种方式,分别以例子的方式展示了其技术实现的外在表现。

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。