实践系列-GaussDB(DWS)不下推语句总结

举报
爱加班的叶凡 发表于 2020/11/26 15:14:23 2020/11/26
【摘要】 MPPDB不支持下推的用法总结

说明:

文中customer表定义如下:

CREATE TABLE CUSTOMER

(C_CUSTKEY     BIGINT NOT NULL

, C_NAME        VARCHAR(25) NOT NULL

, C_ADDRESS     VARCHAR(40) NOT NULL

, C_NATIONKEY   INT NOT NULL

, C_PHONE       CHAR(15) NOT NULL

, C_ACCTBAL     DECIMAL(15,2)   NOT NULL

, C_MKTSEGMENT  CHAR(10) NOT NULL

, C_COMMENT     VARCHAR(117) NOT NULL

)     distribute by hash(C_CUSTKEY);

1      语法上不支持下推的查询:

1.1      Distinct不支持下推的场景

1)  如果count(distinct expr)中的字段不支持重分布,则不支持下推

示例:

create table test_stream(a int,b float); //float不支持重分布

postgres=# explain select count(distinct b) from test_stream;

                                           QUERY PLAN                                           

-------------------------------------------------------------------------------------------------

 Aggregate  (cost=2.50..2.51 rows=1 width=8)

   ->  Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=8)

         Node/s: All datanodes

(3 rows)

1.2      Count(distinct) + group by 如果group by的字段不支持重分布,则不支持下推

示例:

create table test_stream(a int,b float);

postgres=#  explain select count(distinct(a)) from test_stream group by b order by b;

                                               QUERY PLAN                                              

--------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=49.83..57.43 rows=10 width=12)

   Group By Key: test_stream.b

   ->  Sort  (cost=49.83..52.33 rows=1000 width=12)

         Sort Key: test_stream.b

         ->  Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=12)

               Node/s: All datanodes

(6 rows)

 

IsTypeDistributable:不支持的重分布类型

1.3      不支持distinct on用法下推:

示例:

postgres=# explain select distinct on (c_custkey) c_custkey from customer;

                                             QUERY PLAN                                            

----------------------------------------------------------------------------------------------------

 Unique  (cost=49.83..54.83 rows=10 width=8)

   ->  Sort  (cost=49.83..52.33 rows=1000 width=8)

         Sort Key: customer.c_custkey

         ->  Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=8)

               Node/s: All datanodes

(5 rows)

1.4      不支持聚集函数中使用order by语句的下推:

示例:

postgres=#  explain select count(c_custkey order by c_custkey) from customer;

                                          QUERY PLAN                                         

----------------------------------------------------------------------------------------------

 Aggregate  (cost=2.50..2.51 rows=1 width=4)

   ->  Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=4)

         Node/s: All datanodes

(3 rows)

1.5      不支持范围表查询的下推,即使用 WITH RECURSIVE

示例

postgres=# explain WITH RECURSIVE customer_recursive(c_name, c_nationkey) AS (

SELECT 1, c_nationkey FROM customer WHERE c_custkey = 1)

SELECT *  FROM customer_recursive;

                                           QUERY PLAN                                          

------------------------------------------------------------------------------------------------

 CTE Scan on customer_recursive  (cost=0.00..20.00 rows=1000 width=8)

   CTE customer_recursive

     ->  Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=4)

           Node/s: datanode1

(4 rows)

1.6      不支持returning语句下推:

示例:

postgres=# explain update customer set c_name = 'a' returning c_name;

                                           QUERY PLAN                                          

------------------------------------------------------------------------------------------------

 Update on customer  (cost=0.00..0.00 rows=1000 width=498)

   Node/s: All datanodes

   Node expr: c_custkey

   ->  Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=498)

         Node/s: All datanodes

(5 rows)

1.7      from list也不支持下推

示例:

select cd_education_status, min(cd_gender), cd_dep_employed_count,grouping(cd_dep_employed_count)

from customer_demographics

group by 1 ,cube(1,3),cd_dep_employed_count >all (select distinct length(cd_demo_sk)

where cd_purchase_estimate = 500 and cd_education_status like '%da%')

having(case when length(trim(both 'e'

from cd_education_status)) > 8 then 'YES' when length(trim(both 'e'

from cd_education_status)) < 8 then 'NO' end) = 'YES'

and cd_dep_employed_count not in (2,4,5)

order by 1, 2, 3,4;

1.8      with recursive不下推场景

(1)       enable_stream_recursiveoffwith recursive语法不下推

(2)       With-Recursive中涉及的基表在多nodegroup中不下推

(3)       with recursive语法其他不支持下推的场景

序号

场景

不下推原因

1

WITH recursive t_result AS (
  SELECT concat(test_rec.dm,test_rec.sj_dm),*
  FROM test_rec
  WHERE dm = '3'
  UNION ALL
  SELECT
concat(t1.dm,t1.sj_dm),t2.*
  FROM t_result t1
  JOIN test_rec t2 ON t2.dm = t1.sj_dm
  )
  SELECT * FROM t_result;

带有不下推函数

2

explain (costs off, nodes off) with recursive rq as

(

    select id, name from  chinamap where id = 11

    union

    select origin.id, rq.name || ' > ' ||   origin.name

    from rq join chinamap origin on   origin.pid = rq.id

)

select id, name from rq order   by 1;

                              QUERY PLAN

-----------------------------------------------------------------------

 Sort

   Sort Key: rq.id

   CTE rq

     ->    Recursive Union

           ->  Data Node Scan on chinamap   "_REMOTE_TABLE_QUERY_"

           ->  Hash Join

                 Hash Cond: (origin.pid =   rq.id)

                 ->  Data Node Scan on chinamap   "_REMOTE_TABLE_QUERY_"

                 ->  Hash

                       ->  WorkTable Scan on rq

   ->    CTE Scan on rq

(11 rows)

With-Recursive不包含ALL

 

2

with recursive all_path(id, neighbor, path) as (
    SELECT
      id, neighbor,
ARRAY[id]
    FROM graph WHERE id = 'A'
      union all
    SELECT
      all_path.id,
      graph.neighbor,
ARRAY[graph.id]
    FROM graph
    JOIN all_path
    ON all_path.neighbor = graph.id
  )
  select * from all_path;

数组操作

3

with recursive cte as
  (select area_code, area_code as chain, 1 as level /*chain
为路径,例如>101>201>203;level为层级,从1开始*/
  from gcms.gcm_mag_area_h
  where area_code='100000'
  union all
  select h.area_code,cast(cte.chain||'>'||h.area_code as   varchar2(30)),cte.level+1
  from cte
  join gcms.gcm_mag_area_h h
  on h.belong_area_code=cte.area_code and cte.level <3 and h.area_code
not in (select   regexp_split_to_table(cte.chain,'>') )/*如果是循环的结构,可以用chain来退出循环;此处cte.level<6表示到第6层停止*/
  )
  select * from cte;

1.subquery里面包含不带fromlist
  2.
包含相关子查询

4

select
  dm,
 
(
  with recursive cte as
  (
  select distinct f.dm, f.sj_dm, f.name from test_rec_1 f where   f.sj_dm=t1.sj_dm
  union all
  select distinct f.dm, f.sj_dm, f.name from test_rec_1 f join cte on
  f.dm=cte.sj_dm
  )
  select distinct dm from cte where sj_dm=7
  ) dm_2
,
  sj_dm, name
  from test_rec t1;

相关子查询

5

WITH recursive t_result AS (
  SELECT dm,sj_dm,name,1 as level
  FROM test_rec_part
  WHERE sj_dm < 10
  UNION all
  SELECT t2.dm,t2.sj_dm,t2.name||' > '||t1.name,t1.level+1
  FROM t_result t1
  JOIN test_rec_part t2 ON
1=2
  )
  SELECT *
  FROM t_result t;

join条件恒不成立

6

WITH recursive t_result AS (
  select * from(
  SELECT dm,sj_dm,name,1 as level
  FROM test_rec_part
  WHERE sj_dm < 10
limit 6 offset 2)
  UNION all
  SELECT t2.dm,t2.sj_dm,t2.name||' > '||t1.name,t1.level+1
  FROM t_result t1
  JOIN test_rec_part t2 ON t2.sj_dm = t1.dm
  )
  SELECT *
  FROM t_result t;

non recursive term包含limit offset

7

WITH recursive t_result AS (
  SELECT dm,sj_dm,name,1 as level
  FROM test_rec_part
  WHERE sj_dm < 10
  UNION all
  SELECT t2.dm,t2.sj_dm,t2.name||' > '||t1.name,t1.level+1
  FROM t_result t1
  JOIN test_rec_part t2 ON t2.sj_dm = t1.dm and t1.sj_dm>t2.dm
  )
  SELECT *
  FROM
t_result t where dm < 5
  union all
  SELECT *
  FROM
t_result t where dm > 5;

output阶段多次调用cte

8

with recursive t_result1 as (
    select id,pid,name as name,1 as level from test_date where   pid='2018-12-31 23:59:54'
      union all
    select t2.id,t2.pid,t2.name||' > '||t1.name, t1.level+1 from   t_result1 t1 join test_date t2 on t1.id=t2.pid
  ),
t_result2 as (
    select id,pid,name,1 as level from test_date where id='2018-12-31   23:59:54'
      union all
    select t2.id,t2.pid,t1.name||' > '||t2.name, t1.level+1 from   t_result2 t1 join
t_result1 t2 on t1.pid=t2.id
  )
  select * from t_result1 join t_result2 on t_result1.id<t_result2.id;

并列多个递归cte,并且有相互引用(即:存在CTE嵌套)

9

WITH recursive t_result AS (
  SELECT
max(dm) as dm,max(sj_dm) as sj_dm,max(name) as name
  FROM test_rec
  WHERE dm = '3'
  UNION ALL
  SELECT t2.dm,t2.sj_dm,t2.name
  FROM t_result t1
  JOIN test_rec_part t2 ON
t2.dm = t1.sj_dm
  )
  SELECT *
  FROM t_result;

agg结果做递归条件

10

WITH RECURSIVE migora_cte AS(
  --non recursive term
  with b as (
  select * FROM dams_branch b
  WHERE b.stru_sname LIKE '%' || 'a' || '%'
  AND b.stru_state IN ('1', '2')
  AND b.stru_sign <> '7')
  SELECT
  stru_id ,stru_sname ,stru_lv ,stru_sort  ,b.stru_id::text AS   MIG_ROW_ALIAS1 ,b.sort AS MIG_ROW_ALIAS2
  FROM dams_branch b
  WHERE b.stru_id = 100
  AND b.stru_state IN ( '1' ,'2' )
  UNION ALL
  --recursive term
  SELECT b.stru_id ,b.stru_sname ,b.stru_lv ,b.stru_sort   ,mig_ora_cte_tab_alias.MIG_ROW_ALIAS1 || '*' || b.stru_id AS MIG_ROW_ALIAS1   ,b.sort AS MIG_ROW_ALIAS2
  FROM migora_cte mig_ora_cte_tab_alias INNER JOIN dams_branch b ON   mig_ora_cte_tab_alias.stru_id = b.sup_stru AND b.stru_state IN ( '1' ,'2' )
  )
  SELECT stru_id ,stru_sname
  FROM migora_cte b
  ORDER BY
  sign( instr( b.MIG_ROW_ALIAS1 ,'a' ) ) DESC ,b.stru_lv ,b.stru_sort DESC   NULLS LAST ,MIG_ROW_ALIAS2 DESC ,
nlssort(stru_sname,   'NLS_SORT=SCHINESE_PINYIN_M') ;

order by nlssort

 

1.9      表在目标列中不支持下推

postgres=# explain select customer from customer;

                                       QUERY PLAN

----------------------------------------------------------------------------------------

 Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=40 width=580)

   Node/s: All datanodes

(2 rows)

 

2      表对象不支持下推:

Roundrobinmodulo

2.1                     对于普通表,不支持RROBIN MODULO两种分布的查询下推

示例:

postgres=# create by table test(a int) distribute roundrobin;

CREATE TABLE

postgres=# explain select * from test;

                                     QUERY PLAN                                    

------------------------------------------------------------------------------------

 Data Node Scan on test "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=4)

   Node/s: All datanodes

(2 rows)

postgres=# create table test(a int) distribute by  modulo(a);

CREATE TABLE

postgres=# explain select * from test;

                                     QUERY PLAN                                    

------------------------------------------------------------------------------------

 Data Node Scan on test "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1000 width=4)

   Node/s: All datanodes

3      函数不支持下推:

3.1      不支持下推的聚集函数:

array_aggstring_aggxmlaggeverycurvalsetvallastvalnextvalpg_backend_pidpg_stat_get_backend_pid

3.2      不支持下推的普通函数:

注:查询函数属于那种类型使用:select provolatile from pg_proc where proname='funcname';

示例

postgres=# select provolatile from pg_proc where proname='lastval';

 provolatile

-------------

 v

(1 row)

其中:v 代表易变的,即VOLATILEs代表稳定的,即stablei代表不变的,即IMMUTABLE

3.2.1        对于IMMUTABLE的函数,都是支持下推的,除了every  函数

示例:

postgres=#  select provolatile from pg_proc where proname='int8um';

 provolatile

-------------

 i

(1 row)

postgres=# explain select int8um(C_CUSTKEY)  from CUSTOMER;

                           QUERY PLAN                          

----------------------------------------------------------------

 Streaming (type: GATHER)  (cost=0.00..40.50 rows=40 width=8)

   Node/s: All datanodes

   ->  Seq Scan on customer  (cost=0.00..10.12 rows=10 width=8)

(3 rows)

3.2.2        对于易变的(VOLATILE)函数,都是不支持下推的,函数的返回值为RECORD类型的不支持下推

示例:

函数的返回值为RECORD类型的不支持下推

CREATE OR REPLACE FUNCTION test_proc( i integer,j out integer)

RETURN record

AS

    result int;

    BEGIN

        RETURN NULL;

    END;

/

postgres=# explain select test_proc(1) from customer;

                                        QUERY PLAN                                       

------------------------------------------------------------------------------------------

 Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..250.00 rows=1000 width=0)

   Node/s: All datanodes

(2 rows)

易变函数不支持下推:

示例:

postgres=# select  provolatile from pg_proc where proname='random';

 provolatile

-------------

 v

(1 row)

postgres=# explain select random() from CUSTOMER;

                                       QUERY PLAN                                      

----------------------------------------------------------------------------------------

 Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..2.50 rows=1000 width=0)

   Node/s: All datanodes

(2 rows)

3.2.3        对于稳定的(stable)的函数,只有部分是支持下推的;

支持的函数列表如下:

       to_date(两个参数)

       to_date(一个参数)

       pg_column_size

       abstimein,

       abstimeout

       reltimein,

       reltimeout,

       tintervalin,

       tintervalout,

       cash_in,

       cash_out,

       numeric(cash),

       money(numeric),

       money(int2),

       money(int8),

       date_in,

       date_out,

       time_in,

       time_out,

       timestamp_in,

       timestamp_out,

       smalldatetime_in,

       timestamp_out,

       timestamptz_out,

       smalldatetime_out,

       interval_in,

       date_part(text,timestamptz),

       timestamptz(date),

       timestamptz(date,time),

       date(timestamptz),

       date(abstime),

       timestamptz_pl_interval,

       timestamptz_mi_interval,

       date_trunc(text,timestamptz),

       timetz_in,

       time(abstime),

       date_part(text,abstime),

       date_part(text,reltime),

       timetz(timestamptz),

       to_timestamp(text,text),

       time(timestamptz),

       timestamp(abstime),

       timestamp(timestamptz),

       timestamptz(timestamp),

       abstime(timestamp),

       timetz(time),

       date_lt_timestamptz,

       date_le_timestamptz,

       date_eq_timestamptz,

       date_gt_timestamptz,

       date_ge_timestamptz,

       date_ne_timestamptz,

       date_cmp_timestamptz,

       timestamptz_lt_date,

       timestamptz_le_date,

       timestamptz_eq_date,

       timestamptz_gt_date,

       timestamptz_ge_date,

       timestamptz_ne_date,

       timestamptz_cmp_date,

       timestamp_lt_timestamptz,

       timestamp_le_timestamptz,

       timestamp_eq_timestamptz,

       timestamp_gt_timestamptz,

       timestamp_ge_timestamptz,

       timestamp_ne_timestamptz,

       timestamp_cmp_timestamptz,

       timestamptz_lt_timestamp,

       timestamptz_le_timestamp,

       timestamptz_eq_timestamp,

       timestamptz_gt_timestamp,

       timestamptz_ge_timestamp,

       timestamptz_ne_timestamp,

       timestamptz_cmp_timestamp,

       interval_pl_timestamptz

3.3      not shippable关键字不下推

CREATE FUNCTION shipping_func_time_02() RETURNS timestamptz

    AS 'select clock_timestamp();'

    LANGUAGE SQL

    stable not shippable

RETURNS NULL ON NULL INPUT;

explain (costs off)  select shipping_func_time_02(),t5.c from t4, t5 where t4.b=t5.a order by 1 limit 3;

                          QUERY PLAN                          

---------------------------------------------------------------

 Limit

   ->  Hash Join

         Hash Cond: (t4.b = t5.a)

         ->  Data Node Scan on t4 "_REMOTE_TABLE_QUERY_"

               Node/s: All datanodes

         ->  Hash

               ->  Data Node Scan on t5 "_REMOTE_TABLE_QUERY_"

                     Node/s: All datanodes

(8 rows)

/* 修改函数为下推函数 */

alter function shipping_func_time_02() shippable;

/* 查看函数定义是否下推 */

select proname,proshippable from pg_proc where proname='shipping_func_time_02' or proname='shipping_func_time_01' or proname='shipping_func_time_03' order by 1;

3.4      current_user / user不下推

postgres=# explain select * from customer where c_name=current_user;

                                      QUERY PLAN

---------------------------------------------------------------------------------------

 Data Node Scan on customer "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1 width=556)

   Node/s: All datanodes

(2 rows)

SQL语句中存在current_user或user写法时不能下推,需要替换为可下推语法。方法如下:

在库中创建如下函数current_user_select(),并将业务中使用的current_user替换为current_user_select即可实现下推。

CREATE   OR REPLACE FUNCTION public.current_user_select()

RETURNS   name

LANGUAGE   internal

immutable   STRICT

AS   $function$current_user$function$;

示例如下:

原语句

select current_user from emp;

原语句

select current_user_select() from emp;

 

3.5      sysdate / current_date / SYSTIMESTAMP不下推

SQL语句中存在sysdate或current_date写法时不能下推,同时,mpp不支持SYSTIMESTAMP,需要使用current_timetamp(0)替换。方法如下:

原语句

select sysdate from emp;

新语句

select current_timestamp(0) from emp;

 

原语句

select current_date from emp;

新语句

select current_timestamp(0) from emp;

 

原语句

select SYSTIMESTAMP from emp;

新语句

select current_timestamp(0) from emp;

注:必须带有参数才能下推,即使用current_timestamp(0)。current_timestamp带有时间及时区信息,如果不需要,可以使用substr函数截取。

 

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。