- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

【SQL优化】为什么有时候无法走执行性能更优的hashjoin

leapdb 发表于 2023/09/06 11:28:30 2023/09/06

【摘要】如何强制SQL走性能更优的hash join

1. hash join通常优于nestloop join

通常nestloop join的复杂度是O(N方)，hash join时间复杂度是O(N)，所以我们一般倾向于使用hash join。

在SQL脚本调优过程中通常有两种方式，强制走hash join方式：

1. 在session级关闭nestloop方式，set enable_nestloop to off;

2. 在SQL中通过 /*+ hashjoin(a b) */ 方式，让a和b表走hash join;

CREATE DATABASE test_td WITH DBCOMPATIBILITY='td';

create table dim_day(day_code char(8));
create table dwr_rpo as select current_date - 1 as day_code; --返回了date类型

test_td=# \d+ dwr_rpo
                       Table "public.dwr_rpo"
  Column  | Type | Modifiers | Storage | Stats target | Description 
----------+------+-----------+---------+--------------+-------------
 day_code | date |           | plain   |              | 
Has OIDs: no
Distribute By: ROUND ROBIN
Location Nodes: ALL DATANODES
Options: orientation=row, compression=no

explain select *
from dwr_rpo a
left join dim_day c
on c.day_code = a.day_code;

id |                  operation                   | E-rows  | E-distinct | E-memory | E-width |   E-costs    
---+----------------------------------------------+---------+------------+----------+---------+--------------
 1 | ->  Streaming (type: GATHER)                 | 1310148 |            |          |    1694 | 279235196.70 
 2 |    ->  Nested Loop Left Join (3, 4)          | 1310148 |            | 1MB      |    1694 | 279229682.93 
 3 |       ->  Seq Scan on dwr_rpo a              | 1310148 |            | 1MB      |    1676 | 46589.16     
 4 |       ->  Materialize                        |  109575 |            | 16MB     |      22 | 3747.76      
 5 |          ->  Streaming(type: BROADCAST)      |  109575 |            | 2MB      |      22 | 3565.14      
 6 |             ->  Seq Scan on dim_day c        |   36525 |            | 1MB      |      22 | 272.75       

               Predicate Information (identified by plan id)                
-----------------------------------------------------------------------------
  2 --Nested Loop Left Join (3, 4)
        Join Filter: ((c.day_code)::timestamp without time zone = a.day_code)

可是，以上SQL无论用哪种方式都走不上hash join。我们需要看一下，join两端的数据类型是否支持hash比较。

1. 为什么有时候无法走执行性能更优的hashjoin

不同数据类型计算hash函数不同，互不兼容的数据类型无法进行hash比较。

2. 为什么hashjoin秒级，nestloop需要两个小时

nestloop复杂度：131w * 10w = 1310亿

hashjoin复杂度：131w

所以两种方式性能差距很大。

3. 为什么有类型转换，还不能hash join

看似类型相近，但由于两端的精度，格式，有无时区等不一样，无法认为直接相等。

4. 都哪些数据类型间的join不支持hash?

select oprname,oprkind,oprcanhash,
  (select typname from pg_type where oid=oprleft) oprleft,
  (select typname from pg_type where oid=oprright) oprright 
from pg_operator 
where oprname='=' and oprcanhash='f';

 oprname | oprkind | oprcanhash |    oprleft    |   oprright    
---------+---------+------------+---------------+---------------
 =       | b       | f          | xid           | int8
 =       | b       | f          | xid32         | int4
 =       | b       | f          | tid           | tid
 =       | b       | f          | box           | box
 =       | b       | f          | path          | path
 =       | b       | f          | tinterval     | tinterval
 =       | b       | f          | money         | money
 =       | b       | f          | circle        | circle
 =       | b       | f          | lseg          | lseg
 =       | b       | f          | line          | line
 =       | b       | f          | bit           | bit
 =       | b       | f          | varbit        | varbit
 =       | b       | f          | date          | timestamp
 =       | b       | f          | date          | timestamptz
 =       | b       | f          | timestamp     | date
 =       | b       | f          | timestamptz   | date
 =       | b       | f          | timestamp     | timestamptz
 =       | b       | f          | timestamptz   | timestamp
 =       | b       | f          | tsvector      | tsvector
 =       | b       | f          | tsquery       | tsquery
 =       | b       | f          | record        | record
 =       | b       | f          | hll           | hll
 =       | b       | f          | hll_hashval   | hll_hashval
 =       | b       | f          | roaringbitmap | roaringbitmap
(24 rows)

主要是timestamp, timestamptz, date间互相join是无法走hash。其它数据类型不常见。

开发建议：join两端的数据类型尽量一致或互相兼容。

5. 为什么 oracle 兼容模式没有问题，td兼容模式有问题？

current_date 在TD兼容模式下为date类型；

current_date 在Oracle兼容模式下为timestamp类型；

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

【SQL优化】为什么有时候无法走执行性能更优的hashjoin

1. hash join通常优于nestloop join

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

【SQL优化】为什么有时候无法走执行性能更优的hashjoin

1. hash join通常优于nestloop join

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品