GaussDB(DWS)之锁等待场景介绍
一般数据库事务管理中所指的锁是指表级锁,GaussDB(DWS)中支持的锁模式有8种,按排他级别分别为1-8。
每种锁模式都有与之相冲突的锁模式,由锁冲突表定义相关的信息,锁冲突表如下:
编号 | 名称 | 用途 | 冲突关系 |
1 | AccessShareLock | SELECT | 8 |
2 | RowShareLock | SELECT FOR UPDATE/FOR SHARE | 7 | 8 |
3 | RowExclusiveLock | INSERT/UPDATE/DELETE | 5 | 6 | 7 | 8 |
4 | ShareUpdateExclusiveLock | VACUUM | 4 | 5 | 6 | 7 | 8 |
5 | ShareLock | CREATE INDEX | 3 | 4 | 6 | 7 | 8 |
6 | ShareRowExclusiveLock | ROW SELECT...FOR UPDATE | 3 | 4 | 5 | 6 | 7 | 8 |
7 | ExclusiveLock | BLOCK ROW SHARE/SELECT...FOR UPDATE | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
8 | AccessExclusiveLock | DROP CLASS/VACUUM FULL | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
当两个事务的锁产生冲突时,未拿到锁的线程会等锁,等锁时间超过系统设置参数lockwait_timeout(默认20分钟)并且达到cn retry的次数(max_query_retry_times,默认6),就会出现报错。报错信息会将持锁语句打印出来,例如:
postgres=# select * from t1;
ERROR: Lock wait timeout: thread 140722372077312 on node coordinator1 waiting for AccessShareLock on relation 16430 of database 14699 after 2000.158 ms
LINE 1: select * from t1;
^
DETAIL: blocked by hold lock thread 140722329605888, statement <truncate t1;>, hold lockmode AccessExclusiveLock.
上述查询报错的原因是因为等锁超时,持锁语句为truncate t1。
接下来简单介绍下快速找到持锁语句的几种方法,首先构造锁等待场景:
第一步:在第一个session中执行:
begin;
truncate table t1;
第二步:在第二个session中执行select语句,执行卡主:
select * from t1;
接下来介绍两种常用的快速找到持锁语句的方法:
方法一:通过pg_locks查找
1. 打开一个新的session,找到该语句对应的pid,可参考GaussDB(DWS)查看后台活跃SQL和执行状态,waiting为t表示语句正在等待执行。
postgres=# select * from pgxc_stat_activity where query like 'select * from t1%';
coorname | datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | enqueue | state | resource_pool | query_id | query | connection_info
--------------+-------+----------+-----------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+------------------------------+-------------------------------+-------------------------------+---------+---------+--------+---------------+----------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------
coordinator1 | 14699 | postgres | 140722372077312 | 10 | lkp | gsql | | | -1 | 2020-12-30 21:26:41.044308+08 | 2020-12-30 21:26:44.72377+08 | 2020-12-30 21:26:44.723725+08 | 2020-12-30 21:26:44.723727+08 | t | | active | default_pool | 0 | select * from t1; | {"driver_name":"libpq","driver_version":"(GaussDB 8.0.0 build 7ced6a83) compiled at 2020-12-18 17:38:55 commit 7768 last mr 13325 debug"}
(1 row)
2. 根据该语句的pid查看该节点(coordinator1 )的等待视图:
postgres=# execute direct on(coordinator1) 'select * from pgxc_thread_wait_status where tid = 140722372077312';
node_name | db_name | thread_name | query_id | tid | lwtid | ptid | tlevel | smpid | wait_status | wait_event
--------------+----------+-------------+----------+-----------------+--------+------+--------+-------+--------------+------------
coordinator1 | postgres | gsql | 0 | 140722372077312 | 214228 | | 0 | 0 | acquire lock | relation
(1 row)
3. 查询结果显示该语句在cn1上等锁,且wait_event是relation,因此根据pg_locks查看语句再哪张表上持锁,然后根据该relation确认表上的锁等待状态:
postgres=# select * from pg_locks where pid = 140722372077312 and locktype = 'relation';
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----------------+-----------------+---------+----------
relation | 14699 | 16430 | | | | | | | | 12/214 | 140722372077312 | AccessShareLock | f | f
(1 row)
postgres=#
postgres=# select * from pg_locks where relation = 16430;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----------------+---------------------+---------+----------
relation | 14699 | 16430 | | | | | | | | 12/214 | 140722372077312 | AccessShareLock | f | f
relation | 14699 | 16430 | | | | | | | | 11/1109 | 140722329605888 | AccessExclusiveLock | t | f
(2 rows)
4. pg_locks中granted为t表示持锁语句,pid为持锁语句在该节点上的query对应的pid,可在pgxc_stat_activity中查看持锁语句:
postgres=# select * from pgxc_stat_activity where pid = 140722329605888;
coorname | datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | enqueue | state | resource_pool | query_id | query | connection_info
--------------+-------+----------+-----------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+------------------------------+-------------------------------+---------+---------+---------------------+---------------+----------+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------
coordinator1 | 14699 | postgres | 140722329605888 | 10 | lkp | gsql | | | -1 | 2020-12-30 21:39:34.030827+08 | 2020-12-30 21:39:37.251805+08 | 2020-12-30 21:39:52.15638+08 | 2020-12-30 21:39:52.175311+08 | f | | idle in transaction | default_pool | 0 | truncate table t1; | {"driver_name":"libpq","driver_version":"(GaussDB 8.0.0 build 7ced6a83) compiled at 2020-12-18 17:38:55 commit 7768 last mr 13325 debug"}
(1 row)
可以看到,持锁语句为truncate语句,该语句的状态为idle in transaction状态,表示事务等待提交,手动提交该事务后select成功。
方法二:缩短锁等待报错时间
前面介绍过,锁等待时间与lockwait_timeout和max_query_retry_times有关,可以调小这两个参数,根据报错提示快速找到持锁语句:
set lockwait_timeout = 5;
set max_query_retry_times = 0;
select * from t1;
ERROR: Lock wait timeout: thread 140722372077312 on node coordinator1 waiting for AccessShareLock on relation 16430 of database 14699 after 2009.550 ms
LINE 1: select * from t1;
^
DETAIL: blocked by hold lock thread 140722329605888, statement <truncate table t1;>, hold lockmode AccessExclusiveLock.
解决方法:
1. 业务上避免锁冲突的并发场景;
2. 通过pg_terminate_backend函数去对应的节点上杀掉持锁语句
- 点赞
- 收藏
- 关注作者
评论(0)