GaussDB(DWS)之锁等待场景介绍

举报
Arrow0lf 发表于 2020/12/30 22:11:14 2020/12/30
【摘要】 本帖简单介绍GaussDB(DWS)的锁等待场景

一般数据库事务管理中所指的锁是指表级锁,GaussDB(DWS)中支持的锁模式有8种,按排他级别分别为1-8。

每种锁模式都有与之相冲突的锁模式,由锁冲突表定义相关的信息,锁冲突表如下:


编号 名称 用途 冲突关系
1 AccessShareLock SELECT  8
2 RowShareLock SELECT FOR UPDATE/FOR SHARE  7 | 8
3 RowExclusiveLock INSERT/UPDATE/DELETE  5 | 6 | 7 | 8
4 ShareUpdateExclusiveLock VACUUM  4 | 5 | 6 | 7 | 8
5 ShareLock CREATE INDEX  3 | 4 | 6 | 7 | 8
6 ShareRowExclusiveLock ROW SELECT...FOR UPDATE  3 | 4 | 5 |  6 | 7 | 8
7 ExclusiveLock BLOCK ROW SHARE/SELECT...FOR UPDATE  2 | 3 | 4 | 5 |  6 | 7 | 8
8 AccessExclusiveLock DROP CLASS/VACUUM FULL  1 | 2 | 3 | 4 | 5 |  6 | 7 | 8


当两个事务的锁产生冲突时,未拿到锁的线程会等锁,等锁时间超过系统设置参数lockwait_timeout(默认20分钟)并且达到cn retry的次数(max_query_retry_times,默认6),就会出现报错。报错信息会将持锁语句打印出来,例如:

postgres=# select * from t1;
ERROR:  Lock wait timeout: thread 140722372077312 on node coordinator1 waiting for AccessShareLock on relation 16430 of database 14699 after 2000.158 ms
LINE 1: select * from t1;
                      ^
DETAIL:  blocked by hold lock thread 140722329605888, statement <truncate t1;>, hold lockmode AccessExclusiveLock.

上述查询报错的原因是因为等锁超时,持锁语句为truncate t1。

接下来简单介绍下快速找到持锁语句的几种方法,首先构造锁等待场景:

第一步:在第一个session中执行:

begin;
truncate table t1;

第二步:在第二个session中执行select语句,执行卡主:

select * from t1;

接下来介绍两种常用的快速找到持锁语句的方法:

方法一:通过pg_locks查找

1. 打开一个新的session,找到该语句对应的pid,可参考GaussDB(DWS)查看后台活跃SQL和执行状态,waiting为t表示语句正在等待执行。

postgres=# select * from pgxc_stat_activity where query like 'select * from t1%';
   coorname   | datid | datname  |       pid       | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start          |          query_start          |         state_change          | waiting | enqueue | state  | resource_pool | query_id |       query       |                                                              connection_info                                                              
--------------+-------+----------+-----------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+------------------------------+-------------------------------+-------------------------------+---------+---------+--------+---------------+----------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------
 coordinator1 | 14699 | postgres | 140722372077312 |       10 | lkp     | gsql             |             |                 |          -1 | 2020-12-30 21:26:41.044308+08 | 2020-12-30 21:26:44.72377+08 | 2020-12-30 21:26:44.723725+08 | 2020-12-30 21:26:44.723727+08 | t       |         | active | default_pool  |        0 | select * from t1; | {"driver_name":"libpq","driver_version":"(GaussDB 8.0.0 build 7ced6a83) compiled at 2020-12-18 17:38:55 commit 7768 last mr 13325 debug"}
(1 row)

2. 根据该语句的pid查看该节点(coordinator1 )的等待视图:

postgres=# execute direct on(coordinator1) 'select * from pgxc_thread_wait_status where tid = 140722372077312';
  node_name   | db_name  | thread_name | query_id |       tid       | lwtid  | ptid | tlevel | smpid | wait_status  | wait_event 
--------------+----------+-------------+----------+-----------------+--------+------+--------+-------+--------------+------------
 coordinator1 | postgres | gsql        |        0 | 140722372077312 | 214228 |      |      0 |     0 | acquire lock | relation
(1 row)

3. 查询结果显示该语句在cn1上等锁,且wait_event是relation,因此根据pg_locks查看语句再哪张表上持锁,然后根据该relation确认表上的锁等待状态:

postgres=# select * from pg_locks where pid = 140722372077312 and locktype = 'relation';
  locktype  | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |       pid       |      mode       | granted | fastpath 
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----------------+-----------------+---------+----------
relation |    14699 |    16430 |      |       |            |               |         |       |          | 12/214             | 140722372077312 | AccessShareLock     | f       | f
(1 row)

postgres=# 
postgres=# select * from pg_locks where relation = 16430;
 locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |       pid       |        mode         | granted | fastpath 
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----------------+---------------------+---------+----------
 relation |    14699 |    16430 |      |       |            |               |         |       |          | 12/214             | 140722372077312 | AccessShareLock     | f       | f
 relation |    14699 |    16430 |      |       |            |               |         |       |          | 11/1109            | 140722329605888 | AccessExclusiveLock | t       | f
(2 rows)

4. pg_locks中granted为t表示持锁语句,pid为持锁语句在该节点上的query对应的pid,可在pgxc_stat_activity中查看持锁语句:

postgres=# select * from pgxc_stat_activity where pid = 140722329605888;
   coorname   | datid | datname  |       pid       | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |         query_start          |         state_change          | waiting | enqueue |        state        | resource_pool | query_id |       query        |                                                              connection_info                                                              
--------------+-------+----------+-----------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+------------------------------+-------------------------------+---------+---------+---------------------+---------------+----------+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------
 coordinator1 | 14699 | postgres | 140722329605888 |       10 | lkp     | gsql             |             |                 |          -1 | 2020-12-30 21:39:34.030827+08 | 2020-12-30 21:39:37.251805+08 | 2020-12-30 21:39:52.15638+08 | 2020-12-30 21:39:52.175311+08 | f       |         | idle in transaction | default_pool  |        0 | truncate table t1; | {"driver_name":"libpq","driver_version":"(GaussDB 8.0.0 build 7ced6a83) compiled at 2020-12-18 17:38:55 commit 7768 last mr 13325 debug"}
(1 row)

可以看到,持锁语句为truncate语句,该语句的状态为idle in transaction状态,表示事务等待提交,手动提交该事务后select成功。

方法二:缩短锁等待报错时间

前面介绍过,锁等待时间与lockwait_timeout和max_query_retry_times有关,可以调小这两个参数,根据报错提示快速找到持锁语句:

set lockwait_timeout = 5;
set max_query_retry_times = 0;
select * from t1;
ERROR:  Lock wait timeout: thread 140722372077312 on node coordinator1 waiting for AccessShareLock on relation 16430 of database 14699 after 2009.550 ms
LINE 1: select * from t1;
                      ^
DETAIL:  blocked by hold lock thread 140722329605888, statement <truncate table t1;>, hold lockmode AccessExclusiveLock.

解决方法:

1. 业务上避免锁冲突的并发场景;

2. 通过pg_terminate_backend函数去对应的节点上杀掉持锁语句

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。