实践系列-GaussDB(DWS)不下推语句总结
说明:
文中customer表定义如下:
CREATE TABLE CUSTOMER
(C_CUSTKEY BIGINT NOT NULL
, C_NAME VARCHAR(25) NOT NULL
, C_ADDRESS VARCHAR(40) NOT NULL
, C_NATIONKEY INT NOT NULL
, C_PHONE CHAR(15) NOT NULL
, C_ACCTBAL DECIMAL(15,2) NOT NULL
, C_MKTSEGMENT CHAR(10) NOT NULL
, C_COMMENT VARCHAR(117) NOT NULL
) distribute by hash(C_CUSTKEY);
1 语法上不支持下推的查询:
1.1 Distinct不支持下推的场景
1) 如果count(distinct expr)中的字段不支持重分布,则不支持下推
示例:
create table test_stream(a int,b float); //float不支持重分布
postgres=# explain select count(distinct b) from test_stream;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Aggregate (cost=2.50..2.51 rows=1 width=8)
-> Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=8)
Node/s: All datanodes
(3 rows)
1.2 Count(distinct) + group by 如果group by的字段不支持重分布,则不支持下推
示例:
create table test_stream(a int,b float);
postgres=# explain select count(distinct(a)) from test_stream group by b order by b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
GroupAggregate (cost=49.83..57.43 rows=10 width=12)
Group By Key: test_stream.b
-> Sort (cost=49.83..52.33 rows=1000 width=12)
Sort Key: test_stream.b
-> Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=12)
Node/s: All datanodes
(6 rows)
IsTypeDistributable:不支持的重分布类型
1.3 不支持distinct on用法下推:
示例:
postgres=# explain select distinct on (c_custkey) c_custkey from customer;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Unique (cost=49.83..54.83 rows=10 width=8)
-> Sort (cost=49.83..52.33 rows=1000 width=8)
Sort Key: customer.c_custkey
-> Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=8)
Node/s: All datanodes
(5 rows)
1.4 不支持聚集函数中使用order by语句的下推:
示例:
postgres=# explain select count(c_custkey order by c_custkey) from customer;
QUERY PLAN
----------------------------------------------------------------------------------------------
Aggregate (cost=2.50..2.51 rows=1 width=4)
-> Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=4)
Node/s: All datanodes
(3 rows)
1.5 不支持范围表查询的下推,即使用 WITH RECURSIVE
示例:
postgres=# explain WITH RECURSIVE customer_recursive(c_name, c_nationkey) AS (
SELECT 1, c_nationkey FROM customer WHERE c_custkey = 1)
SELECT * FROM customer_recursive;
QUERY PLAN
------------------------------------------------------------------------------------------------
CTE Scan on customer_recursive (cost=0.00..20.00 rows=1000 width=8)
CTE customer_recursive
-> Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=4)
Node/s: datanode1
(4 rows)
1.6 不支持returning语句下推:
示例:
postgres=# explain update customer set c_name = 'a' returning c_name;
QUERY PLAN
------------------------------------------------------------------------------------------------
Update on customer (cost=0.00..0.00 rows=1000 width=498)
Node/s: All datanodes
Node expr: c_custkey
-> Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=498)
Node/s: All datanodes
(5 rows)
1.7 无from list也不支持下推
示例:
select cd_education_status, min(cd_gender), cd_dep_employed_count,grouping(cd_dep_employed_count)
from customer_demographics
group by 1 ,cube(1,3),cd_dep_employed_count >all (select distinct length(cd_demo_sk)
where cd_purchase_estimate = 500 and cd_education_status like '%da%')
having(case when length(trim(both 'e'
from cd_education_status)) > 8 then 'YES' when length(trim(both 'e'
from cd_education_status)) < 8 then 'NO' end) = 'YES'
and cd_dep_employed_count not in (2,4,5)
order by 1, 2, 3,4;
1.8 with recursive不下推场景
(1) enable_stream_recursive为off,with recursive语法不下推
(2) With-Recursive中涉及的基表在多nodegroup中不下推
(3) with recursive语法其他不支持下推的场景
序号 |
场景 |
不下推原因 |
1 |
WITH recursive t_result AS ( |
带有不下推函数 |
2 |
explain (costs off, nodes off) with recursive rq as ( select id, name from chinamap where id = 11 union select origin.id, rq.name || ' > ' || origin.name from rq join chinamap origin on origin.pid = rq.id ) select id, name from rq order by 1; QUERY PLAN ----------------------------------------------------------------------- Sort Sort Key: rq.id CTE rq -> Recursive Union -> Data Node Scan on chinamap "_REMOTE_TABLE_QUERY_" -> Hash Join Hash Cond: (origin.pid = rq.id) -> Data Node Scan on chinamap "_REMOTE_TABLE_QUERY_" -> Hash -> WorkTable Scan on rq -> CTE Scan on rq (11 rows) |
With-Recursive不包含ALL
|
2 |
with recursive all_path(id, neighbor, path) as ( |
数组操作 |
3 |
with recursive cte as |
1.subquery里面包含不带fromlist |
4 |
select |
相关子查询 |
5 |
WITH recursive t_result AS ( |
join条件恒不成立 |
6 |
WITH recursive t_result AS ( |
non recursive term包含limit offset |
7 |
WITH recursive t_result AS ( |
output阶段多次调用cte |
8 |
with recursive t_result1 as ( |
并列多个递归cte,并且有相互引用(即:存在CTE嵌套) |
9 |
WITH recursive t_result AS ( |
agg结果做递归条件 |
10 |
WITH RECURSIVE migora_cte AS( |
order by nlssort |
1.9 表在目标列中不支持下推
postgres=# explain select customer from customer;
QUERY PLAN
----------------------------------------------------------------------------------------
Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=40 width=580)
Node/s: All datanodes
(2 rows)
2 表对象不支持下推:
Roundrobin、modulo
2.1 对于普通表,不支持RROBIN 、MODULO两种分布的查询下推
示例:
postgres=# create by table test(a int) distribute roundrobin;
CREATE TABLE
postgres=# explain select * from test;
QUERY PLAN
------------------------------------------------------------------------------------
Data Node Scan on test "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=4)
Node/s: All datanodes
(2 rows)
postgres=# create table test(a int) distribute by modulo(a);
CREATE TABLE
postgres=# explain select * from test;
QUERY PLAN
------------------------------------------------------------------------------------
Data Node Scan on test "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=4)
Node/s: All datanodes
3 函数不支持下推:
3.1 不支持下推的聚集函数:
array_agg、string_agg、xmlagg、every、curval、setval、lastval、nextval、pg_backend_pid、pg_stat_get_backend_pid
3.2 不支持下推的普通函数:
注:查询函数属于那种类型使用:select provolatile from pg_proc where proname='funcname';
示例:
postgres=# select provolatile from pg_proc where proname='lastval';
provolatile
-------------
v
(1 row)
其中:v 代表易变的,即VOLATILE;s代表稳定的,即stable;i代表不变的,即IMMUTABLE
3.2.1 对于IMMUTABLE的函数,都是支持下推的,除了every 函数
示例:
postgres=# select provolatile from pg_proc where proname='int8um';
provolatile
-------------
i
(1 row)
postgres=# explain select int8um(C_CUSTKEY) from CUSTOMER;
QUERY PLAN
----------------------------------------------------------------
Streaming (type: GATHER) (cost=0.00..40.50 rows=40 width=8)
Node/s: All datanodes
-> Seq Scan on customer (cost=0.00..10.12 rows=10 width=8)
(3 rows)
3.2.2 对于易变的(VOLATILE)函数,都是不支持下推的,函数的返回值为RECORD类型的不支持下推
示例:
函数的返回值为RECORD类型的不支持下推
CREATE OR REPLACE FUNCTION test_proc( i integer,j out integer)
RETURN record
AS
result int;
BEGIN
RETURN NULL;
END;
/
postgres=# explain select test_proc(1) from customer;
QUERY PLAN
------------------------------------------------------------------------------------------
Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..250.00 rows=1000 width=0)
Node/s: All datanodes
(2 rows)
易变函数不支持下推:
示例:
postgres=# select provolatile from pg_proc where proname='random';
provolatile
-------------
v
(1 row)
postgres=# explain select random() from CUSTOMER;
QUERY PLAN
----------------------------------------------------------------------------------------
Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..2.50 rows=1000 width=0)
Node/s: All datanodes
(2 rows)
3.2.3 对于稳定的(stable)的函数,只有部分是支持下推的;
支持的函数列表如下:
to_date(两个参数)
to_date(一个参数)
pg_column_size
abstimein,
abstimeout
reltimein,
reltimeout,
tintervalin,
tintervalout,
cash_in,
cash_out,
numeric(cash),
money(numeric),
money(int2),
money(int8),
date_in,
date_out,
time_in,
time_out,
timestamp_in,
timestamp_out,
smalldatetime_in,
timestamp_out,
timestamptz_out,
smalldatetime_out,
interval_in,
date_part(text,timestamptz),
timestamptz(date),
timestamptz(date,time),
date(timestamptz),
date(abstime),
timestamptz_pl_interval,
timestamptz_mi_interval,
date_trunc(text,timestamptz),
timetz_in,
time(abstime),
date_part(text,abstime),
date_part(text,reltime),
timetz(timestamptz),
to_timestamp(text,text),
time(timestamptz),
timestamp(abstime),
timestamp(timestamptz),
timestamptz(timestamp),
abstime(timestamp),
timetz(time),
date_lt_timestamptz,
date_le_timestamptz,
date_eq_timestamptz,
date_gt_timestamptz,
date_ge_timestamptz,
date_ne_timestamptz,
date_cmp_timestamptz,
timestamptz_lt_date,
timestamptz_le_date,
timestamptz_eq_date,
timestamptz_gt_date,
timestamptz_ge_date,
timestamptz_ne_date,
timestamptz_cmp_date,
timestamp_lt_timestamptz,
timestamp_le_timestamptz,
timestamp_eq_timestamptz,
timestamp_gt_timestamptz,
timestamp_ge_timestamptz,
timestamp_ne_timestamptz,
timestamp_cmp_timestamptz,
timestamptz_lt_timestamp,
timestamptz_le_timestamp,
timestamptz_eq_timestamp,
timestamptz_gt_timestamp,
timestamptz_ge_timestamp,
timestamptz_ne_timestamp,
timestamptz_cmp_timestamp,
interval_pl_timestamptz
3.3 not shippable关键字不下推
CREATE FUNCTION shipping_func_time_02() RETURNS timestamptz
AS 'select clock_timestamp();'
LANGUAGE SQL
stable not shippable
RETURNS NULL ON NULL INPUT;
explain (costs off) select shipping_func_time_02(),t5.c from t4, t5 where t4.b=t5.a order by 1 limit 3;
QUERY PLAN
---------------------------------------------------------------
Limit
-> Hash Join
Hash Cond: (t4.b = t5.a)
-> Data Node Scan on t4 "_REMOTE_TABLE_QUERY_"
Node/s: All datanodes
-> Hash
-> Data Node Scan on t5 "_REMOTE_TABLE_QUERY_"
Node/s: All datanodes
(8 rows)
/* 修改函数为下推函数 */
alter function shipping_func_time_02() shippable;
/* 查看函数定义是否下推 */
select proname,proshippable from pg_proc where proname='shipping_func_time_02' or proname='shipping_func_time_01' or proname='shipping_func_time_03' order by 1;
3.4 current_user / user不下推
postgres=# explain select * from customer where c_name=current_user;
QUERY PLAN
---------------------------------------------------------------------------------------
Data Node Scan on customer "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1 width=556)
Node/s: All datanodes
(2 rows)
SQL语句中存在current_user或user写法时不能下推,需要替换为可下推语法。方法如下:
在库中创建如下函数current_user_select(),并将业务中使用的current_user替换为current_user_select即可实现下推。
CREATE OR REPLACE FUNCTION public.current_user_select() RETURNS name LANGUAGE internal immutable STRICT AS $function$current_user$function$; |
示例如下:
原语句 |
select current_user from emp; |
原语句 |
select current_user_select() from emp; |
3.5 sysdate / current_date / SYSTIMESTAMP不下推
SQL语句中存在sysdate或current_date写法时不能下推,同时,mpp不支持SYSTIMESTAMP,需要使用current_timetamp(0)替换。方法如下:
原语句 |
select sysdate from emp; |
新语句 |
select current_timestamp(0) from emp; |
原语句 |
select current_date from emp; |
新语句 |
select current_timestamp(0) from emp; |
原语句 |
select SYSTIMESTAMP from emp; |
新语句 |
select current_timestamp(0) from emp; |
注:必须带有参数才能下推,即使用current_timestamp(0)。current_timestamp带有时间及时区信息,如果不需要,可以使用substr函数截取。
- 点赞
- 收藏
- 关注作者
评论(0)