GuassDB(DWS)的连接类型介绍
连接类型介绍
想要通过SQL完成各种复杂的查询,则多表之间的连接是必不可少的。连接分为:内连接和外连接两大类,每大类中还可进行细分。
为了能更好的说明各种连接之间的区别,下面通过具体示例进行各连接的展示。
首先做如下表创建和设置:
CREATE TABLE student(
id INTEGER,
name varchar(50)
);
CREATE TABLE math_score(
id INTEGER,
score INTEGER
);
INSERT INTO student VALUES(1, 'Tom');
INSERT INTO student VALUES(2, 'Lily');
INSERT INTO student VALUES(3, 'Tina');
INSERT INTO student VALUES(4, 'Perry');
INSERT INTO math_score VALUES(1, 80);
INSERT INTO math_score VALUES(2, 75);
INSERT INTO math_score VALUES(4, 95);
INSERT INTO math_score VALUES(6, NULL);
Set enable_fast_query_shipping = off;
Set explain_perf_mode = pretty;
1. 内连接
- 标准内连接(INNER JOIN)
语法:left_table [INNER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:表示left_table和rignt_table中满足join_condition的行拼接在一起作为结果输出,不满足条件的元组不会输出。
示例1:查询学生的数学成绩
postgres=# Select s.id, s.name, ms.score from student s join math_score ms on s.id = ms.id;
id | name | score
----+-------+-------
1 | Tom | 80
2 | Lily | 75
4 | Perry | 95
(3 rows)
postgres=# explain Select s.id, s.name, ms.score from student s join math_score ms on s.id = ms.id;
QUERY PLAN
----------------------------------------------------------------------------
id | operation | E-rows | E-width | E-costs
----+----------------------------------------+--------+---------+---------
1 | -> Streaming (type: GATHER) | 30 | 126 | 36.59
2 | -> Hash Join (3,4) | 30 | 126 | 28.59
3 | -> Seq Scan on student s | 30 | 122 | 14.14
4 | -> Hash | 29 | 8 | 14.14
5 | -> Seq Scan on math_score ms | 30 | 8 | 14.14
Predicate Information (identified by plan id)
---------------------------------------------
2 --Hash Join (3,4)
Hash Cond: (s.id = ms.id)
(12 rows)
- 交叉连接(CROSS JOIN)
语法:left_table CROSS JOIN right_table
说明:表示left_table中所有行和right_table中的所有行分别进行连接,最终结果行数等于两边行数的乘积。又称笛卡尔积。
示例2:学生表和数学成绩表的交叉连接
postgres=# select s.id, s.name, ms.score from student s cross join math_score ms;
id | name | score
----+-------+-------
1 | Tom | 80
1 | Tom | 75
1 | Tom | 95
2 | Lily | 80
2 | Lily | 75
2 | Lily | 95
4 | Perry | 80
4 | Perry | 75
4 | Perry | 95
3 | Tina | 80
3 | Tina | 75
3 | Tina | 95
3 | Tina |
1 | Tom |
2 | Lily |
4 | Perry |
(16 rows)
postgres=# explain select s.id, s.name, ms.score from student s cross join math_score ms;
QUERY PLAN
----------------------------------------------------------------------------
id | operation | E-rows | E-width | E-costs
----+----------------------------------------+--------+---------+---------
1 | -> Streaming (type: GATHER) | 900 | 126 | 74.85
2 | -> Nested Loop (3,5) | 900 | 126 | 37.35
3 | -> Streaming(type: BROADCAST) | 90 | 122 | 17.93
4 | -> Seq Scan on student s | 30 | 122 | 14.14
5 | -> Materialize | 30 | 4 | 14.21
6 | -> Seq Scan on math_score ms | 30 | 4 | 14.14
(8 rows)
- 自然连接(NATURAL JOIN)
语法:left_table NATURAL JOIN right_table
说明:表示left_table和right_table中列名相同的列进行等值连接,且自动将同名列只保留一份。
示例3:学生表和数学成绩表的自然连接,两表的同名列为id列,将按照id列进行等值连接
postgres=# select * from student s natural join math_score ms;
id | name | score
----+-------+-------
1 | Tom | 80
2 | Lily | 75
4 | Perry | 95
(3 rows)
postgres=# explain select * from student s natural join math_score ms;
QUERY PLAN
----------------------------------------------------------------------------
id | operation | E-rows | E-width | E-costs
----+----------------------------------------+--------+---------+---------
1 | -> Streaming (type: GATHER) | 30 | 126 | 36.59
2 | -> Hash Join (3,4) | 30 | 126 | 28.59
3 | -> Seq Scan on student s | 30 | 122 | 14.14
4 | -> Hash | 29 | 8 | 14.14
5 | -> Seq Scan on math_score ms | 30 | 8 | 14.14
Predicate Information (identified by plan id)
---------------------------------------------
2 --Hash Join (3,4)
Hash Cond: (s.id = ms.id)
(12 rows)
2. 外连接
- 左外连接
语法:left_table LEFT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:左外连接的结果集包括left_table的所有行,而不仅仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中输出right_table的列均为空值。
示例4:学生表和数学成绩表进行左外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充
postgres=# select s.id, s.name, ms.score from student s left join math_score ms on (s.id = ms.id);
id | name | score
----+-------+-------
3 | Tina |
1 | Tom | 80
2 | Lily | 75
4 | Perry | 95
(4 rows)
postgres=# explain select s.id, s.name, ms.score from student s left join math_score ms on (s.id = ms.id);
QUERY PLAN
----------------------------------------------------------------------------
id | operation | E-rows | E-width | E-costs
----+----------------------------------------+--------+---------+---------
1 | -> Streaming (type: GATHER) | 30 | 126 | 36.59
2 | -> Hash Left Join (3, 4) | 30 | 126 | 28.59
3 | -> Seq Scan on student s | 30 | 122 | 14.14
4 | -> Hash | 29 | 8 | 14.14
5 | -> Seq Scan on math_score ms | 30 | 8 | 14.14
Predicate Information (identified by plan id)
---------------------------------------------
2 --Hash Left Join (3, 4)
Hash Cond: (s.id = ms.id)
(12 rows)
- 右外连接
语法:left_table RIGHT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:与左外连接相反,右外连接的结果集包括right_table的所有行,而不仅仅是连接列所匹配的行。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。
示例5:学生表和数学成绩表进行右外连接,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充
postgres=# select ms.id, s.name, ms.score from student s right join math_score ms on (s.id = ms.id);
id | name | score
----+-------+-------
6 | |
1 | Tom | 80
2 | Lily | 75
4 | Perry | 95
(4 rows)
postgres=# explain select ms.id, s.name, ms.score from student s right join math_score ms on (s.id = ms.id);
QUERY PLAN
-------------------------------------------------------------------------
id | operation | E-rows | E-width | E-costs
----+-------------------------------------+--------+---------+---------
1 | -> Streaming (type: GATHER) | 30 | 126 | 36.59
2 | -> Hash Left Join (3, 4) | 30 | 126 | 28.59
3 | -> Seq Scan on math_score ms | 30 | 8 | 14.14
4 | -> Hash | 29 | 122 | 14.14
5 | -> Seq Scan on student s | 30 | 122 | 14.14
Predicate Information (identified by plan id)
---------------------------------------------
2 --Hash Left Join (3, 4)
Hash Cond: (ms.id = s.id)
(12 rows)
- 全外连接
语法:left_table FULL [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:全外连接是左外连接和右外连接的综合。全外连接的结果集包括left_table和right_table的所有行,而不仅仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中right_table的列均为空值。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。
示例6:学生表和数学成绩表进行全外外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充
postgres=# select s.id, s.name, ms.id, ms.score from student s full join math_score ms on (s.id = ms.id);
id | name | id | score
----+-------+----+-------
3 | Tina | |
1 | Tom | 1 | 80
2 | Lily | 2 | 75
4 | Perry | 4 | 95
| | 6 |
(5 rows)
postgres=# explain select s.id, s.name, ms.id, ms.score from student s full join math_score ms on (s.id = ms.id);
QUERY PLAN
----------------------------------------------------------------------------
id | operation | E-rows | E-width | E-costs
----+----------------------------------------+--------+---------+---------
1 | -> Streaming (type: GATHER) | 30 | 130 | 36.59
2 | -> Hash Full Join (3, 4) | 30 | 130 | 28.59
3 | -> Seq Scan on student s | 30 | 122 | 14.14
4 | -> Hash | 29 | 8 | 14.14
5 | -> Seq Scan on math_score ms | 30 | 8 | 14.14
Predicate Information (identified by plan id)
---------------------------------------------
2 --Hash Full Join (3, 4)
Hash Cond: (s.id = ms.id)
(12 rows)
细心的朋友从上面示例的查询计划中不难发现,外连接在查询计划中的join算子中,相应的左外连接和全外连接会显示Left或Full字眼,而对于右外连接,在join算子中却显示的是Left。这是因为,右外连接其实就是将左右表进行交互后的左外连接,所以数据库内部实现为了减少处理逻辑,会将右外连接转为左外连接。
多表查询中on条件和where条件的区别
从上面各种连接语法中可见,除自然连接和交叉连接外,其他都需要有on条件(using在查询解析过程中会被转为on条件)来限制两表连接的结果。通常在查询的语句中也都会有where条件限制查询结果。这里说的on连接条件和where过滤条件是指不含可以下推到表上的过滤条件。那么on和where的区别是什么呢?on条件是两表连接的约束条件,where是对两表连接后产生的结果集再次进行过滤。简单说就是on条件优先于where条件,在两表进行连接时被应用;生成两表连接结果集后,再应用where条件。
想了解GuassDB(DWS)更多信息,欢迎微信搜索“GaussDB DWS”关注微信公众号,和您分享最新最全的PB级数仓黑科技,后台还可获取众多学习资料哦~
- 点赞
- 收藏
- 关注作者
评论(0)