Gaussdb(DWS)老事务排查与查杀
执行业务报错“Can't fit xid into page”
排查集群中是否存在新老事务号差值超过42亿的情况
步骤1:创建事务号查询函数
CREATE OR REPLACE FUNCTION public.pgxc_all_running_xacts()
RETURNS SETOF pg_running_xacts
LANGUAGE plpgsql
NOT FENCED NOT SHIPPABLE
AS $function$
DECLARE
row_data pg_running_xacts%rowtype;
row_name record;
query_str text;
query_str_nodes text;
BEGIN
--Get all the coordinator node names
query_str_nodes := 'SELECT node_name FROM pgxc_node';
FOR row_name IN EXECUTE(query_str_nodes) LOOP
query_str := 'EXECUTE DIRECT ON (' || row_name.node_name|| ') ''SELECT * FROM pg_running_xacts''';
FOR row_data IN EXECUTE(query_str) LOOP
return next row_data;
END LOOP;
END LOOP;
return;
END; $function$;
步骤2:排查新老事务号差值42亿以上的老事务
select txid_current() as current,xmin,current-xmin::text::bigint as txid_diff,node,pid from pgxc_all_running_xacts() where xmin::text::bigint !=0 and txid_diff>=4200000000 order by xmin::text::bigint;
步骤3:拼接老事务(事务号与当前最新事务号差值大于1亿,注意:需要将$xmin替换成步骤2中第二列xmin对应值)检查查杀SQL
select 'execute direct on('||node||') ''select pg_terminate_backend('||pid||')'';' killSql,'execute direct on('||node||') ''select query_id,query from pg_stat_activity where pid='||pid||''';' checkSql, txid_current() as current,xmin,node,pid from pgxc_all_running_xacts() where xmin::text::bigint !=0 and current-xmin::text::bigint>=4200000000 and gxid::text::bigint=$xmin order by xmin::text::bigint;
步骤4:
根据步骤2中记录,先执行第二列checkSql对应语句,明确具体业务,业务侧决策是否可以查杀,如果可以查杀,则执行第一列killSql对应语句
步骤5:
此时再执行步骤2语句,明确老事务是否查杀成功,若未成功,则需要kill对应dn实例
- 点赞
- 收藏
- 关注作者
评论(0)