- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

【Postgres】Semijoin估算优化

厚积薄发发表于 2019/10/27 11:56:34 2019/10/27

【摘要】 Improve planner's cost estimation in the presence of semijoins.

对于类似下面的语句

SELECT * FROM x WHERE x1 IN (SELECT y1 FROM y)

在使用x的索引进行参数化索引扫描的时候，循环的次数应该是不重复y1值的个数，不应该是y表的个数。提交记录如下

Improve planner's cost estimation in the presence of semijoins.

该语句在逻辑重写阶段将转化为SEMI JOIN类型查询关联语句，在函数create_index_paths中针对关联列评估索引扫描的代价（参数化索引路径），其中重要的一个是获取外表大小（Semijoin和innerjoin类似），因此可以在代价估计的时候，考虑其进行进行分组，提高优化器的估算的准确性。

该部分操作在函数adjust_rowcount_for_semijoins中完成，从函数定义上可以看出，其是给semijoin操作调整行数估计的，实现如下:

/*
 * Check to see if outer_relid is on the inside of any semijoin that cur_relid
 * is on the outside of.  If so, replace rowcount with the estimated number of
 * unique rows from the semijoin RHS (assuming that's smaller, which it might
 * not be).  The estimate is crude but it's the best we can do at this stage
 * of the proceedings.
 */
static double
adjust_rowcount_for_semijoins(PlannerInfo *root,
							  Index cur_relid,   // 左表relid
							  Index outer_relid, // 右表relid
							  double rowcount)   // 右表满足条件
{
	ListCell   *lc;

	foreach(lc, root->join_info_list)
	{
        // 获取连接信息
		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);

		if (sjinfo->jointype == JOIN_SEMI &&  // SEMIjoin类型
			bms_is_member(cur_relid, sjinfo->syn_lefthand) &&  // 匹配左表
			bms_is_member(outer_relid, sjinfo->syn_righthand)) // 匹配右表
		{
			/* Estimate number of unique-ified rows */
			double		nraw;
			double		nunique;
			// 估计右表集合的行数，将所有基表估计行数相乘，通常情况下SEMI join是单表情况
			nraw = approximate_joinrel_size(root, sjinfo->syn_righthand);
            // 估算分组后的唯一行数
			nunique = estimate_num_groups(root,
										  sjinfo->semi_rhs_exprs,
										  nraw,
										  NULL);
            // 调整估算的循环次数
			if (rowcount > nunique)
				rowcount = nunique;
		}
	}
	return rowcount;
}

上面涉及到一个SpecialJoinInfo结构体，该结构体主要用于记录表与表之间的连接关系，将group分组的操作数记录在其中，该结构体变量make_outerjoin中调用compute_semijoin_info填充，结构体定义如下

struct SpecialJoinInfo
{
	NodeTag		type;
	Relids		min_lefthand;	/* 限制连接顺序，记录LHS最小集合 */
	Relids		min_righthand;	/* 限制连接顺序，记录RHS最小集合 */
	Relids		syn_lefthand;	/* Query->jointree中的LHS集合 */
	Relids		syn_righthand;	/* Query->jointree中的RHS集合 */
	JoinType	jointype;		/* 连接类型：Left join、Full join等 */
	bool		lhs_strict;		/* 限制连接顺序 */
	bool		delay_upper_joins;	/* 限制连接顺序 */
	/* 下面给semijoin连接类型设置: */
	bool		semi_can_btree; /* 操作符是否支持索引 */
	bool		semi_can_hash;	/* 操作符是否支持hash */
	List	   *semi_operators; /* 连接条件中的操作符 */
	List	   *semi_rhs_exprs; /* 连接条件中右操作数 */
};

执行如下(执行计划中已经消除了semiJoin，转化为InnerJoin)

postgres=# explain select * from test1 where a in (select * from test2);
                                  QUERY PLAN
------------------------------------------------------------------------------
 Nested Loop  (cost=2.66..77.60 rows=10 width=4)
   ->  HashAggregate  (cost=2.38..2.48 rows=10 width=4)
         Group Key: test2.a
         ->  Seq Scan on test2  (cost=0.00..2.10 rows=110 width=4)
   ->  Index Only Scan using a_idx on test1  (cost=0.29..7.50 rows=1 width=4)
         Index Cond: (a = test2.a)
(6 rows)

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

【Postgres】Semijoin估算优化

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

【Postgres】Semijoin估算优化

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

推荐阅读

相关产品