GaussDB(DWS)物化视图查询重写解析
前言
物化视图(materialized view)是一张表,存储了物化视图定义中查询语句的结果。
DWS中物化视图的使用方式分为两种:
-
直接查询:将物化视图定义成自动刷新或者定时刷新,使物化视图的数据保持较新,直接查询物化视图可以提升查询性能。
-
查询重写:执行sql查询时,自动将查询改写为对物化视图的查询,而不需要手动修改查询语句。
本文主要介绍DWS中物化视图查询重写这一部分。
基本原理
DWS中,物化视图查询重写支持基于文本的查询重写和基于结构的查询重写两种重写方式。
-
基于文本的查询重写:支持复杂定义的物化视图。当查询和物化视图定义一致时(不考虑排序),可以重写成物化视图。需要注意,为了保持结果集的正确性,查询和物化视图定义中的表有别名时,别名需要一致。
-
基于结构的查询重写:支持SPJG(Select-Projection-Join-Groupby)的物化视图查询重写。不要求数据一致,支持在物化视图的数据基础上再进行加工得到查询所需的数据,或者是查询中的子查询所需的数据。
查询重写的主要流程:
-
查询预处理:收集查询中涉及了哪些基表
-
搜集物化视图:根据系统表中的依赖关系,收集这些基表涉及哪些物化视图可以被用来重写
-
物化视图剪枝:去掉数据已经过时的物化视图,并根据物化视图定义进行排序,当候选的物化视图个数较多时,去掉优先级较低的物化视图。
-
文本匹配重写:将候选的物化视图与查询进行文本匹配,如果有物化视图匹配成功,则直接进行重写,不需要进行结构匹配。
-
结构匹配重写:将候选的物化视图与查询进行结构匹配,如果匹配成功,直接进行重写,后续优先级较低的物化视图不再进行结构匹配。
使用场景
文本匹配重写
适用于查询语句结构复杂,例如有cte、子查询、子链接、window函数等复杂子句的查询。
以tpcds q4为例,建立定义为q4的支持查询重写的物化视图。
create materialized view mv_tpcds04
ENABLE QUERY REWRITE
AS
with year_total as (
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total
,'s' sale_type
from customer
,store_sales
,date_dim
where c_customer_sk = ss_customer_sk
and ss_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
union all
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total
,'c' sale_type
from customer
,catalog_sales
,date_dim
where c_customer_sk = cs_bill_customer_sk
and cs_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
union all
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total
,'w' sale_type
from customer
,web_sales
,date_dim
where c_customer_sk = ws_bill_customer_sk
and ws_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
select
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
from year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
where t_s_secyear.customer_id = t_s_firstyear.customer_id
and t_s_firstyear.customer_id = t_c_secyear.customer_id
and t_s_firstyear.customer_id = t_c_firstyear.customer_id
and t_s_firstyear.customer_id = t_w_firstyear.customer_id
and t_s_firstyear.customer_id = t_w_secyear.customer_id
and t_s_firstyear.sale_type = 's'
and t_c_firstyear.sale_type = 'c'
and t_w_firstyear.sale_type = 'w'
and t_s_secyear.sale_type = 's'
and t_c_secyear.sale_type = 'c'
and t_w_secyear.sale_type = 'w'
and t_s_firstyear.dyear = 2001
and t_s_secyear.dyear = 2001+1
and t_c_firstyear.dyear = 2001
and t_c_secyear.dyear = 2001+1
and t_w_firstyear.dyear = 2001
and t_w_secyear.dyear = 2001+1
and t_s_firstyear.year_total > 0
and t_c_firstyear.year_total > 0
and t_w_firstyear.year_total > 0
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
> case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end
order by t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
limit 100;
建立物化视图后,打印q4语句的原始查询计划。由于扫描物化视图后返回的数据并不是有序的,所以还需要按照查询中的排序键再排序一次。
EXPLAIN(COSTS OFF)
with year_total as (
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total
,'s' sale_type
from customer
,store_sales
,date_dim
where c_customer_sk = ss_customer_sk
and ss_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
union all
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total
,'c' sale_type
from customer
,catalog_sales
,date_dim
where c_customer_sk = cs_bill_customer_sk
and cs_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
union all
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total
,'w' sale_type
from customer
,web_sales
,date_dim
where c_customer_sk = ws_bill_customer_sk
and ws_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
select
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
from year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
where t_s_secyear.customer_id = t_s_firstyear.customer_id
and t_s_firstyear.customer_id = t_c_secyear.customer_id
and t_s_firstyear.customer_id = t_c_firstyear.customer_id
and t_s_firstyear.customer_id = t_w_firstyear.customer_id
and t_s_firstyear.customer_id = t_w_secyear.customer_id
and t_s_firstyear.sale_type = 's'
and t_c_firstyear.sale_type = 'c'
and t_w_firstyear.sale_type = 'w'
and t_s_secyear.sale_type = 's'
and t_c_secyear.sale_type = 'c'
and t_w_secyear.sale_type = 'w'
and t_s_firstyear.dyear = 2001
and t_s_secyear.dyear = 2001+1
and t_c_firstyear.dyear = 2001
and t_c_secyear.dyear = 2001+1
and t_w_firstyear.dyear = 2001
and t_w_secyear.dyear = 2001+1
and t_s_firstyear.year_total > 0
and t_c_firstyear.year_total > 0
and t_w_firstyear.year_total > 0
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
> case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end
order by t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------
Streaming (type: GATHER)
Merge Sort Key: customer_id, customer_first_name, customer_last_name, customer_email_address
Node/s: All datanodes (group1, bucket:32)
-> Sort
Sort Key: customer_id, customer_first_name, customer_last_name, customer_email_address
-> Seq Scan on mv_tpcds04
(6 rows)
结构匹配重写
条件重写
条件重写,指的是数据经过物化视图中的条件过滤后,能够包括查询的数据。这就要求物化视图定义中的过滤条件要比查询中的更宽松,范围更广,主要表现为两个场景:
建表语句:
CREATE TABLE lineitem (
l_orderkey integer NOT NULL,
l_partkey integer NOT NULL,
l_suppkey integer NOT NULL,
l_linenumber integer NOT NULL,
l_quantity numeric(15,2) NOT NULL,
l_extendedprice numeric(15,2) NOT NULL,
l_discount numeric(15,2) NOT NULL,
l_tax numeric(15,2) NOT NULL,
l_returnflag character(1) NOT NULL,
l_linestatus character(1) NOT NULL,
l_shipdate date NOT NULL,
l_commitdate date NOT NULL,
l_receiptdate date NOT NULL,
l_shipinstruct character(25) NOT NULL,
l_shipmode character(10) NOT NULL,
l_comment character varying(44) NOT NULL
, primary key (l_orderkey, l_linenumber)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (l_orderkey);
-
查询中有过滤条件,物化视图中没有对应的过滤条件:
建立物化视图:
CREATE MATERIALIZED VIEW mv_lineitem_1 ENABLE QUERY REWRITE AS SELECT * FROM lineitem WHERE l_shipdate >= DATE '1995-01-01' AND l_quantity < 24;
进行查询:
EXPLAIN(costs off, nodes off) SELECT * FROM lineitem WHERE l_shipdate >= DATE '1995-01-01' AND l_quantity < 24 AND l_discount <= 0.06 + 0.01; QUERY PLAN ------------------------------------- Streaming (type: GATHER) -> Seq Scan on mv_lineitem_1 Filter: (l_discount <= .07) (3 rows)
查看查询计划发现,
l_discount <= 0.06 + 0.01
这个物化视图中没有的条件,作为补偿条件,变成了扫描物化视图的条件。 -
查询和物化视图中都有对某一列的过滤条件,但是物化视图的过滤条件数学上表示的范围包括了查询:
建立物化视图:
CREATE MATERIALIZED VIEW mv_lineitem_2 ENABLE QUERY REWRITE AS SELECT * FROM lineitem WHERE (l_shipdate >= DATE '1995-01-01' AND l_shipdate <= DATE '1998-01-01') OR (l_shipdate >= DATE '1997-01-01' AND l_shipdate <= DATE '2005-01-01');
查询语句与计划:
EXPLAIN(costs off, nodes off)
SELECT
*
FROM
lineitem
WHERE
l_shipdate >= DATE '1996-01-01'
AND l_shipdate <= DATE '2000-01-01';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Streaming (type: GATHER)
-> Seq Scan on mv_lineitem_2
Filter: ((l_shipdate >= '1996-01-01 00:00:00'::timestamp(0) without time zone) AND (l_shipdate <= '2000-01-01 00:00:00'::timestamp(0) without time zone))
(3 rows)
由于物化视图的过滤条件实际表示范围为[1995-01-01, 2005-01-01]
,查询的过滤条件表示范围为[1996-01-01, 2000-01-01]
,物化视图的范围包括了查询的范围,因此可以重写。
Join重写
DWS的物化视图Join重写,通过等价类匹配等值连接约束,通过精确匹配非等值连接约束,匹配多表关联的连接关系。支持的join类型报错inner join、left join、right join、full join、semi join、anti join。
-
当查询和物化视图中均为inner join时,实际上所有的关联条件均可以视为过滤条件,join重写变成了过滤条件的匹配。
create table lineitem (
l_orderkey integer NOT NULL,
l_partkey integer NOT NULL,
l_suppkey integer NOT NULL,
l_linenumber integer NOT NULL,
l_quantity numeric(15,2) NOT NULL,
l_extendedprice numeric(15,2) NOT NULL,
l_discount numeric(15,2) NOT NULL,
l_tax numeric(15,2) NOT NULL,
l_returnflag character(1) NOT NULL,
l_linestatus character(1) NOT NULL,
l_shipdate date NOT NULL,
l_commitdate date NOT NULL,
l_receiptdate date NOT NULL,
l_shipinstruct character(25) NOT NULL,
l_shipmode character(10) NOT NULL,
l_comment character varying(44) NOT NULL
, primary key (l_orderkey, l_linenumber)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (l_orderkey);
create table orders (
o_orderkey integer NOT NULL,
o_custkey integer NOT NULL,
o_orderstatus character(1) NOT NULL,
o_totalprice numeric(15,2) NOT NULL,
o_orderdate date NOT NULL,
o_orderpriority character(15) NOT NULL,
o_clerk character(15) NOT NULL,
o_shippriority integer NOT NULL,
o_comment character varying(79) NOT NULL
, primary key (O_ORDERKEY)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (o_orderkey);
create table customer (
c_custkey integer NOT NULL,
c_name character varying(25) NOT NULL,
c_address character varying(40) NOT NULL,
c_nationkey integer NOT NULL,
c_phone character(15) NOT NULL,
c_acctbal numeric(15,2) NOT NULL,
c_mktsegment character(10) NOT NULL,
c_comment character varying(117) NOT NULL
, primary key (C_CUSTKEY)
)
WITH (orientation = column)
DISTRIBUTE BY HASH (c_custkey);
CREATE MATERIALIZED VIEW mv_tpch_q3 ENABLE QUERY REWRITE AS
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_shipdate > '1995-03-15'::date;
查询tpch q3:
EXPLAIN(costs OFF, nodes off)
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < '1995-03-15'::date
and l_shipdate > '1995-03-15'::date
group by l_orderkey, o_orderdate, o_shippriority
order by revenue desc, o_orderdate
limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Limit
-> Streaming (type: GATHER)
Merge Sort Key: revenue DESC, o_orderdate
-> Limit
-> Sort
Sort Key: revenue DESC, o_orderdate
-> Seq Scan on mv_tpch_q3
Filter: (o_orderdate <= '1995-03-14 23:59:59'::timestamp without time zone)
(8 rows)
-
当查询和物化视图中对应的关联是外连接时,需要连接类型相同,并且连接顺序一致,inner一侧需要一致。
建物化视图:
CREATE MATERIALIZED VIEW mv_join_outer ENABLE QUERY REWRITE AS SELECT o_orderdate, o_shippriority FROM orders LEFT JOIN customer ON c_custkey = o_custkey;
查询语句与计划:
EXPLAIN(costs off, nodes off) SELECT o_orderdate, o_shippriority FROM orders LEFT JOIN customer ON c_custkey = o_custkey WHERE o_orderdate < '1995-03-15'::date; QUERY PLAN ------------------------------------------------------------------------------------- Streaming (type: GATHER) -> Seq Scan on mv_join_outer Filter: (o_orderdate <= '1995-03-14 23:59:59'::timestamp without time zone) (3 rows)
-
DWS还支持关联类型不同的派生join,当查询和物化视图对应的关联类型满足派生重写的join类型矩阵,并且关联条件相同时,可以进行重写。
关联矩阵为:
查询Join类型
物化视图 Join类型
补偿谓词
inner join
left join
右表列 is not null
anti join
left join
右表列 is null
inner join
right join
左表列 is not null
right anti join
right join
左表列 is null
semi join
inner join
主键/unique 直接转换
right semi join
inner join
主键/unique 直接转换
left join
full join
左表列 is not null
right join
full join
右表列 is not null
inner join
full join
两侧表is not null
建立物化视图:
CREATE MATERIALIZED VIEW mv_join_derive ENABLE QUERY REWRITE AS SELECT o_orderdate, o_shippriority, c_custkey FROM orders LEFT JOIN customer ON c_custkey = o_custkey;
查询语句和计划:
EXPLAIN(costs off, nodes off) SELECT o_orderdate, o_shippriority, c_custkey FROM orders JOIN customer ON c_custkey = o_custkey; QUERY PLAN ----------------------------------------- Streaming (type: GATHER) -> Seq Scan on mv_join_derive Filter: (c_custkey IS NOT NULL) (3 rows)
物化视图中left join的右表customer的关联键添加了is not null条件。
-
DWS支持物化视图的表个数多余查询的view delta join,可以通过主外键的消除连接的方式来进行匹配。
view delta join需要通过主外键关系去消除,因此基表上需要建外键。建表语句:
CREATE TABLE customer ( c_custkey integer NOT NULL, c_name character varying(25) NOT NULL, c_address character varying(40) NOT NULL, c_nationkey integer NOT NULL, c_phone character(15) NOT NULL, c_acctbal numeric(15,2) NOT NULL, c_mktsegment character(10) NOT NULL, c_comment character varying(117) NOT NULL , primary key (C_CUSTKEY) ) WITH (orientation = column) DISTRIBUTE BY HASH (c_custkey); CREATE TABLE lineitem ( l_orderkey integer NOT NULL, l_partkey integer NOT NULL, l_suppkey integer NOT NULL, l_linenumber integer NOT NULL, l_quantity numeric(15,2) NOT NULL, l_extendedprice numeric(15,2) NOT NULL, l_discount numeric(15,2) NOT NULL, l_tax numeric(15,2) NOT NULL, l_returnflag character(1) NOT NULL, l_linestatus character(1) NOT NULL, l_shipdate date NOT NULL, l_commitdate date NOT NULL, l_receiptdate date NOT NULL, l_shipinstruct character(25) NOT NULL, l_shipmode character(10) NOT NULL, l_comment character varying(44) NOT NULL , primary key (l_orderkey, l_linenumber) ) WITH (orientation = column) DISTRIBUTE BY HASH (l_orderkey); CREATE TABLE orders ( o_orderkey integer NOT NULL, o_custkey integer NOT NULL, o_orderstatus character(1) NOT NULL, o_totalprice numeric(15,2) NOT NULL, o_orderdate date NOT NULL, o_orderpriority character(15) NOT NULL, o_clerk character(15) NOT NULL, o_shippriority integer NOT NULL, o_comment character varying(79) NOT NULL , primary key (O_ORDERKEY) ) WITH (orientation = column) DISTRIBUTE BY HASH (o_orderkey); CREATE TABLE nation ( n_nationkey integer NOT NULL, n_name character(25) NOT NULL, n_regionkey integer NOT NULL, n_comment character varying(152) , primary key (n_nationkey) ) WITH (orientation = column) DISTRIBUTE BY HASH (n_nationkey); CREATE TABLE part ( p_partkey integer NOT NULL, p_name character varying(55) NOT NULL, p_mfgr character(25) NOT NULL, p_brand character(10) NOT NULL, p_type character varying(25) NOT NULL, p_size integer NOT NULL, p_container character(10) NOT NULL, p_retailprice numeric(15,2) NOT NULL, p_comment character varying(23) NOT NULL , primary key (p_partkey) ) WITH (orientation = column) DISTRIBUTE BY HASH (p_partkey); CREATE TABLE partsupp ( ps_partkey integer NOT NULL, ps_suppkey integer NOT NULL, ps_availqty integer NOT NULL, ps_supplycost numeric(15,2) NOT NULL, ps_comment character varying(199) NOT NULL , primary key (ps_partkey, ps_suppkey) ) WITH (orientation = column) DISTRIBUTE BY HASH (ps_partkey); CREATE TABLE region ( r_regionkey integer NOT NULL, r_name character(25) NOT NULL, r_comment character varying(152) , primary key (r_regionkey) ) WITH (orientation = column) DISTRIBUTE BY HASH (r_regionkey); CREATE TABLE supplier ( s_suppkey integer NOT NULL, s_name character(25) NOT NULL, s_address character varying(40) NOT NULL, s_nationkey integer NOT NULL, s_phone character(15) NOT NULL, s_acctbal numeric(15,2) NOT NULL, s_comment character varying(101) NOT NULL , primary key (s_suppkey) ) WITH (orientation = column) DISTRIBUTE BY HASH (s_suppkey); ALTER TABLE NATION ADD FOREIGN KEY (N_REGIONKEY) references REGION(R_REGIONKEY); ALTER TABLE SUPPLIER ADD FOREIGN KEY (S_NATIONKEY) references NATION(N_NATIONKEY); ALTER TABLE CUSTOMER ADD FOREIGN KEY (C_NATIONKEY) references NATION(N_NATIONKEY); ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_SUPPKEY) references SUPPLIER(S_SUPPKEY); ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_PARTKEY) references PART(P_PARTKEY); ALTER TABLE ORDERS ADD FOREIGN KEY (O_CUSTKEY) references CUSTOMER(C_CUSTKEY); ALTER TABLE LINEITEM ADD FOREIGN KEY (L_ORDERKEY) references ORDERS(O_ORDERKEY); ALTER TABLE LINEITEM ADD FOREIGN KEY (L_PARTKEY,L_SUPPKEY) references PARTSUPP(PS_PARTKEY, PS_SUPPKEY);
建立物化视图:
create materialized view mv_tpch_flat enable query rewrite with(orientation = column) distribute by hash(l_shipdate, l_orderkey, l_linenumber) as select c_address, c_acctbal,c_comment,c_mktsegment,c_name,c_nationkey,c_phone, l_commitdate,l_linenumber,l_extendedprice,l_orderkey,l_partkey,l_quantity,l_receiptdate,l_returnflag,l_shipdate,l_shipinstruct,l_shipmode,l_suppkey, o_custkey,o_orderdate,o_orderpriority,o_orderstatus,o_shippriority,o_totalprice, p_brand,p_container,p_name,p_size,p_type, s_name,s_nationkey, extract(year from l_shipdate) as l_shipyear, l_extendedprice * (1 - l_discount) as l_saleprice, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as l_amount, ps_supplycost * l_quantity as l_supplycost, extract(year from o_orderdate) as o_orderyear, s_nation.n_name as n_name1, s_nation.n_regionkey as n_regionkey1, c_nation.n_name as n_name2, c_nation.n_regionkey as n_regionkey2, s_region.r_name as r_name1, c_region.r_name as r_name2 from lineitem , partsupp , orders , supplier , part , customer , nation as s_nation , nation as c_nation , region as s_region , region as c_region where lineitem.l_partkey=partsupp.ps_partkey and lineitem.l_suppkey=partsupp.ps_suppkey and lineitem.l_orderkey=orders.o_orderkey and partsupp.ps_partkey=part.p_partkey and partsupp.ps_suppkey=supplier.s_suppkey and customer.c_custkey=orders.o_custkey and supplier.s_nationkey=s_nation.n_nationkey and customer.c_nationkey=c_nation.n_nationkey and s_region.r_regionkey=s_nation.n_regionkey and c_region.r_regionkey=c_nation.n_regionkey ;
查询与计划:
EXPLAIN(costs OFF, nodes off)
SELECT
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < '1995-03-15'::date
and l_shipdate > '1995-03-15'::date
group by l_orderkey, o_orderdate, o_shippriority
order by revenue desc, o_orderdate
limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Row Adapter
-> Vector Limit
-> Vector Streaming (type: GATHER)
Merge Sort Key: (sum(l_saleprice)) DESC, o_orderdate
-> Vector Limit
-> Vector Sort
Sort Key: (sum(l_saleprice)) DESC, o_orderdate
-> Vector Sonic Hash Aggregate
Group By Key: l_orderkey, o_orderdate, o_shippriority
-> Vector Streaming(type: REDISTRIBUTE)
-> CStore Scan on mv_lineitem
Filter: ((o_orderdate <= 'Tue Mar 14 23:59:59 1995'::timestamp without time zone) AND (l_shipdate >= 'Wed Mar 15 00:00:01 1995'::timestamp without time zone) AND (c_mktsegment = 'BUILDING'::bpchar))
Pushdown Predicate Filter: ((o_orderdate <= 'Tue Mar 14 23:59:59 1995'::timestamp without time zone) AND (l_shipdate >= 'Wed Mar 15 00:00:01 1995'::timestamp without time zone) AND (c_mktsegment = 'BUILDING'::bpchar))
(13 rows)
投影重写
投影重写支持在物化视图输出列的基础上,进行复杂的表达式投影,包括查询的输出列,以及补偿添加的过滤条件和关联条件。
建立物化视图:
CREATE MATERIALIZED VIEW mv_projection ENABLE QUERY REWRITE AS
SELECT
o_orderpriority
FROM
orders,
lineitem
WHERE
o_orderkey = l_orderkey
and l_shipmode IN('MAIL ', 'SHIP ')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= DATE '1994-01-01'
and l_receiptdate < DATE '1994-01-01' + INTERVAL '1 year';
查询和计划:
explain (costs off, nodes off, verbose on)
SELECT
sum(CASE
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) AS high_line_count,
sum(CASE
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) AS low_line_count
FROM
orders,
lineitem
WHERE
o_orderkey = l_orderkey
and l_shipmode IN('MAIL ', 'SHIP ')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= DATE '1994-01-01'
and l_receiptdate < DATE '1994-01-01' + INTERVAL '1 year';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate
Output: pg_catalog.sum((sum(CASE WHEN ((o_orderpriority = '1-URGENT'::bpchar) OR (o_orderpriority = '2-HIGH'::bpchar)) THEN 1 ELSE 0 END))), pg_catalog.sum((sum(CASE WHEN ((o_orderpriority <> '1-URGENT'::bpchar) AND (o_orderpriority <> '2-HIGH'::bpchar)) THEN 1 ELSE 0 END)))
-> Streaming (type: GATHER)
Output: (sum(CASE WHEN ((o_orderpriority = '1-URGENT'::bpchar) OR (o_orderpriority = '2-HIGH'::bpchar)) THEN 1 ELSE 0 END)), (sum(CASE WHEN ((o_orderpriority <> '1-URGENT'::bpchar) AND (o_orderpriority <> '2-HIGH'::bpchar)) THEN 1 ELSE 0 END))
-> Aggregate
Output: sum(CASE WHEN ((o_orderpriority = '1-URGENT'::bpchar) OR (o_orderpriority = '2-HIGH'::bpchar)) THEN 1 ELSE 0 END), sum(CASE WHEN ((o_orderpriority <> '1-URGENT'::bpchar) AND (o_orderpriority <> '2-HIGH'::bpchar)) THEN 1 ELSE 0 END)
-> Seq Scan on public.mv_projection
Output: o_orderpriority
(8 rows)
Agg重写
DWS的agg函数重写,支持包括min/max/sum/count/avg多种agg函数。
建立物化视图:
CREATE MATERIALIZED VIEW mv_lineitem_3 ENABLE QUERY REWRITE AS
SELECT
*
FROM
lineitem
WHERE
l_shipdate <= DATE '1998-12-01' - INTERVAL '3 day';
查询与计划:
EXPLAIN(costs OFF, nodes off)
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= DATE '1998-12-01' - INTERVAL '3 day'
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
sum_qty;
QUERY PLAN
--------------------------------------------------------
Streaming (type: GATHER)
Merge Sort Key: (sum(l_quantity))
-> Sort
Sort Key: (sum(l_quantity))
-> HashAggregate
Group By Key: l_returnflag, l_linestatus
-> Streaming(type: REDISTRIBUTE)
-> Seq Scan on mv_lineitem_3
(8 rows)
除了普通的agg函数重写,DWS还支持上卷重写,在已有的聚集列上再次进行聚集,在聚集函数的基础上进行聚集函数投影。
各个agg函数上卷重写结果为:
原聚集函数 |
上卷重写 |
count |
sum(count) |
sum |
sum(sum) |
min |
min(min) |
max |
max(max) |
avg |
sum/count |
rb_build_agg |
rb_or_agg(rb_build_agg) |
建立物化视图:
create materialized view mv_lineitem_agg_1 enable query rewrite
with(orientation = column) distribute by hash(l_orderkey, l_shipdate, l_returnflag, l_linestatus)
as select
l_orderkey,
l_shipdate,
l_returnflag,
l_linestatus,
count(*) as total_cnt,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_discount) as sum_discount,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge
from
lineitem
group by
l_orderkey,
l_shipdate,
l_returnflag,
l_linestatus
;
查询与计划:
EXPLAIN(costs off, nodes off, verbose on)
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= DATE '1998-12-01' - INTERVAL '3 day'
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
sum_qty;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------
Row Adapter
Output: l_returnflag, l_linestatus, (sum((sum(sum_qty)))), (sum((sum(sum_base_price)))), (sum((sum(sum_disc_price)))), (sum((sum(sum_charge)))), ((sum((sum(sum_qty))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((sum((sum(sum_base_price))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric))
, ((sum((sum(sum_discount))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((pg_catalog.sum((sum(total_cnt))))::bigint)
-> Vector Sort
Output: l_returnflag, l_linestatus, (sum((sum(sum_qty)))), (sum((sum(sum_base_price)))), (sum((sum(sum_disc_price)))), (sum((sum(sum_charge)))), ((sum((sum(sum_qty))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((sum((sum(sum_base_price))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::num
eric)), ((sum((sum(sum_discount))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)), ((pg_catalog.sum((sum(total_cnt))))::bigint)
Sort Key: (sum((sum(mv_lineitem_agg_1.sum_qty))))
-> Vector Sonic Hash Aggregate
Output: l_returnflag, l_linestatus, sum((sum(sum_qty))), sum((sum(sum_base_price))), sum((sum(sum_disc_price))), sum((sum(sum_charge))), (sum((sum(sum_qty))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric), (sum((sum(sum_base_price))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric)
, (sum((sum(sum_discount))) / ((pg_catalog.sum((sum(total_cnt))))::bigint)::numeric), (pg_catalog.sum((sum(total_cnt))))::bigint
Group By Key: mv_lineitem_agg_1.l_returnflag, mv_lineitem_agg_1.l_linestatus
-> Vector Streaming (type: GATHER)
Output: l_returnflag, l_linestatus, (sum(sum_qty)), (sum(sum_base_price)), (sum(sum_disc_price)), (sum(sum_charge)), (sum(total_cnt)), (sum(sum_discount))
-> Vector Sonic Hash Aggregate
Output: l_returnflag, l_linestatus, sum(sum_qty), sum(sum_base_price), sum(sum_disc_price), sum(sum_charge), sum(total_cnt), sum(sum_discount)
Group By Key: mv_lineitem_agg_1.l_returnflag, mv_lineitem_agg_1.l_linestatus
-> CStore Scan on public.mv_lineitem_agg_1
Output: l_returnflag, l_linestatus, sum_qty, sum_base_price, sum_disc_price, sum_charge, total_cnt, sum_discount
Distribute Key: l_orderkey, l_shipdate, l_returnflag, l_linestatus
Filter: (mv_lineitem_agg_1.l_shipdate <= '1998-11-28 00:00:00'::timestamp without time zone)
Pushdown Predicate Filter: (mv_lineitem_agg_1.l_shipdate <= '1998-11-28 00:00:00'::timestamp without time zone)
从查询计划里看,查询在物化视图的基础上,进行了上卷重写,sum重写成了sum(sum),avg重写成了sum/count,count重写成了sum(count)。
总结
本文主要介绍了DWS中物化视图查询重写的两种方式,并详细介绍了结构匹配重写的四种方式,分别以例子的方式展示了其技术实现的外在表现。
- 点赞
- 收藏
- 关注作者
评论(0)