性能分析之 GDB 调试 C++ 应用去分析 core dump

举报
zuozewei 发表于 2021/09/13 18:15:26 2021/09/13
【摘要】 这个内容只是为了做个记录。 因为项目中有出现 coredump 的情况。

背景说明

这个内容只是为了做个记录。
因为项目中有出现 coredump 的情况。

问题分析

先用 GDB 调起来。

[app@主机A bin]$ gdb PROGRAM core.31018

下面是一连串的 GDB 信息。

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...

上面这段话的意思是,随便用,没毛病。

Reading symbols from /bin/PROGRAM...done.
[New LWP 31018]
[New LWP 31027]
[New LWP 31022]
[New LWP 31036]
[New LWP 31038]
[New LWP 31041]
[New LWP 31044]
[New LWP 31047]
[New LWP 31042]
[New LWP 31032]
[New LWP 31033]
[New LWP 31034]
[New LWP 31035]
[New LWP 31037]
[New LWP 31020]
[New LWP 31026]
[New LWP 31031]
[New LWP 31030]
[New LWP 31040]
[New LWP 31039]
[New LWP 31046]
[New LWP 31045]
[New LWP 31043]
[New LWP 31019]
[New LWP 31025]
[New LWP 31024]
[New LWP 31023]
[New LWP 31021]
[New LWP 31029]
[New LWP 31028]

上面是 LWP 编号,也就是我们常说的线程号,在 linux 中线程就是 LWP,有人说,LWP 不是线程,而是进程。因为是 light-weight process 嘛,肯定是进程,是的,又不是 thread,确实它是叫做轻量级进程。但是在 linux中,除了它其他的也没有线程了。看一下 WIKI 上说的:

In computer operating systems, a light-weight process (LWP) is a means of achieving multitasking. In the traditional meaning of the term, as used in Unix System V and Solaris, a LWP runs in user space on top of a single kernel thread and shares its address space and system resources with other LWPs within the same process. Multiple user level threads, managed by a thread library, can be placed on top of one or many LWPs - allowing multitasking to be done at the user level, which can have some performance benefits.

看了半天,也不知道所以然是啥对吧。那就对了,不用纠结,来跟我一起说,计较那么多概念干吗,这个东西就是线程!

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

上面是说 debug 用的是啥子库。

Core was generated by `PROGRAM -g 1 -i 3006 -u VM_16_46_centos -U /data/app/log/LOG -m 0 -A'.
Program terminated with signal 6, Ab

这里列出来了是怎么产生的 core。 这里有信号 6. 中止。 系统有多少信号呢?
大概是下面这么多。

信号 处理动作 发出信号的原因 标准
SIGHUP 1 A 终端挂起或者控制进程终止 POSIX.1
SIGINT 2 A 键盘中断(如break键被按下) POSIX.1
SIGQUIT 3 C 键盘的退出键被按下 POSIX.1
SIGILL 4 C 非法指令 POSIX.1
SIGABRT 6 C 由abort(3)发出的退出指令 POSIX.1
SIGFPE 8 C 浮点异常 POSIX.1
SIGKILL 9 AEF Kill信号 POSIX.1
SIGSEGV 11 C 无效的内存引用 POSIX.1
SIGPIPE 13 A 管道破裂:写一个没有读端口的管道 POSIX.1
SIGALRM 14 A 由alarm(2)发出的信号 POSIX.1
SIGTERM 15 A 终止信号 POSIX.1
SIGUSR1 30,10,16 A 用户自定义信号1 POSIX.1
SIGUSR2 31,12,17 A 用户自定义信号2 POSIX.1
SIGCHLD 20,17,18 B 子进程结束信号 POSIX.1
SIGCONT 19,18,25 进程继续(曾被停止的进程) POSIX.1
SIGSTOP 17,19,23 DEF 终止进程 POSIX.1
SIGTSTP 18,20,24 D 控制终端(tty)上按下停止键 POSIX.1
SIGTTIN 21,21,26 D 后台进程企图从控制终端读 POSIX.1
SIGTTOU 22,22,27 D 后台进程企图从控制终端写 POSIX.1
SIGBUS 10,7,10 C 总线错误(错误的内存访问) SUSv2
SIGPOLL A Sys V定义的Pollable事件,与SIGIO同义 SUSv2
SIGPROF 27,27,29 A Profiling定时器到 SUSv2
SIGSYS 12,-,12 C 无效的系统调用(SVID) SUSv2
SIGTRAP 5 C 跟踪/断点捕获 SUSv2
SIGURG 16,23,21 B Socket出现紧急条件(4.2BSD) SUSv2
SIGVTALRM 26,26,28 A 实际时间报警时钟信号(4.2BSD) SUSv2
SIGXCPU 24,24,30 C 超出设定的CPU时间限制(4.2BSD) SUSv2
SIGXFSZ 25,25,31 C 超出设定的文件大小限制(4.2BSD) SUSv2
SIGIOT 6 C IO捕获指令,与SIGABRT同义
SIGEMT 7,-,7
SIGSTKFLT -,16,- A 协处理器堆栈错误
SIGIO 23,29,22 A 某I/O操作现在可以进行了(4.2 BSD)
SIGCLD -,-,18 A 与SIGCHLD同义
SIGPWR 29,30,19 A 电源故障(System V)
SIGINFO 29,-,- A 与SIGPWR同义
SIGLOST -,-,- A 文件锁丢失
SIGWINCH 28,28,20 B 窗口大小改变(4.3 BSD,Sun)
SIGUNUSED -,31,- A 未使用的信号(will be SIGSYS)

那上面的处理动作是什么意思呢?

_A 缺省的动作是终止进程 _
_B 缺省的动作是忽略此信号 _
_C 缺省的动作是终止进程并进行内核映像转储(dump core) _
_D 缺省的动作是停止进程 _
_E 信号不能被捕获 _
_F 信号不能被忽略 _

#0  0x00007fa1fef385f7 in raise () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-19.2.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcurl-7.29.0-25.el7.centos.x86_64 libgcc-4.8.5-4.el7.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-10.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 openldap-2.4.40-8.el7.x86_64 openssl-libs-1.0.1e-42.el7.9.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64

上面这些是引用了一系列的东西来 debug这个 core 文件。要是换了个机器说不定 core 的。要是换了个机器说不定 core 的内容都看不到了呢(我猜的,我并没有那么闲,真的换个机器试一下)。

查看断点。

(gdb) bt

#0  0x00007fa1fef385f7 in raise () from /lib64/libc.so.6
#1  0x00007fa1fef39ce8 in abort () from /lib64/libc.so.6
#2  0x00007fa1fef78317 in __libc_message () from /lib64/libc.so.6
#3  0x00007fa1fef7e184 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fa1fef818e7 in _int_malloc () from /lib64/libc.so.6
#5  0x00007fa1fef828dc in malloc () from /lib64/libc.so.6
#6  0x000000000043a147 in CMemPool::frealloc (ud=0x0, ptr=0x0, osize=0, nsize=64, p=0x1a8a450) at MemPool.h:266
#7  0x0000000000434898 in luaM_realloc_ (L=0x1b344e0, block=0x0, osize=0, nsize=64) at lmem.cpp:79
#8  0x000000000043b481 in luaH_new (L=0x1b344e0, narray=0, nhash=0) at ltable.cpp:359
#9  0x000000000042cbf8 in lua_createtable (L=0x1b344e0, narray=0, nrec=0) at lapi.cpp:582
#10 0x00007fa1fecf0f76 in getMessage (l=0x1b344e0, pMessage=0x7fa1bc0008c0) at message.h:218
#11 0x00007fa1fecf3af6 in getResponse (l=0x1b344e0, res=0x1b0d6d0) at service.cpp:28
#12 0x00007fa1fecf3d3b in sendM (l=0x1b344e0) at service.cpp:59
#13 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b247b0, nresults=2) at ldo.cpp:319
#14 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=1) at lvm.cpp:590
#15 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24740, nResults=-1) at ldo.cpp:377
#16 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9db20) at lapi.cpp:801
#17 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb <f_call(lua_State*, void*)>, ud=0x7ffeb1c9db20) at ldo.cpp:116
#18 0x00000000004314a3 in luaD_pcall (L=0x1b344e0, func=0x42d3eb <f_call(lua_State*, void*)>, u=0x7ffeb1c9db20, old_top=64, ef=0) at ldo.cpp:464
#19 0x000000000042d4c9 in lua_pcall (L=0x1b344e0, nargs=0, nresults=-1, errfunc=0) at lapi.cpp:822
#20 0x000000000044f074 in luaB_pcall (L=0x1b344e0) at lbaselib.cpp:466
#21 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b24730, nresults=2) at ldo.cpp:319
#22 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=2) at lvm.cpp:590
#23 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24710, nResults=-1) at ldo.cpp:377
#24 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9e230) at lapi.cpp:801
#25 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb <f_call(lua_State*, void*)>, ud=0x7ffeb1c9e230) at ldo.cpp:116
#26 0x00000000004314a3 in luaD_pcall (L=0x1b344e0, func=0x42d3eb <f_call(lua_State*, void*)>, u=0x7ffeb1c9e230, old_top=16, ef=0) at ldo.cpp:464
#27 0x000000000042d4c9 in lua_pcall (L=0x1b344e0, nargs=0, nresults=-1, errfunc=0) at lapi.cpp:822
#28 0x0000000000426951 in process () at srv.cpp:120
#29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107
#30 0x00000000004bad36 in _svcdsp ()
#31 0x00000000004a3b4c in _runserver ()
#32 0x00000000004a2a22 in _main ()
#33 0x00000000004265f0 in main ()

上面这条就是告诉你这个 core 文件 dump 点是在哪里,调用关系从下到上。这里面看到的问题点基本上都是底层的调用。而这些底层的调用也只是表现,最重要的是上层的变量是怎么传的。

闲着没事,看下所有线程的当前断点。

(gdb) info threads

  Id   Target Id         Frame
  30   Thread 0x7fa1f5365700 (LWP 31028) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  29   Thread 0x7fa1f4b64700 (LWP 31029) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  28   Thread 0x7fa1f8b6c700 (LWP 31021) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  27   Thread 0x7fa1f7b6a700 (LWP 31023) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  26   Thread 0x7fa1f7369700 (LWP 31024) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  25   Thread 0x7fa1f6b68700 (LWP 31025) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  24   Thread 0x7fa1f9b6e700 (LWP 31019) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23   Thread 0x7fa1edb56700 (LWP 31043) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22   Thread 0x7fa1ecb54700 (LWP 31045) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  21   Thread 0x7fa1ec353700 (LWP 31046) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  20   Thread 0x7fa1efb5a700 (LWP 31039) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  19   Thread 0x7fa1ef359700 (LWP 31040) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  18   Thread 0x7fa1f4363700 (LWP 31030) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17   Thread 0x7fa1f3b62700 (LWP 31031) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  16   Thread 0x7fa1f6367700 (LWP 31026) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15   Thread 0x7fa1f936d700 (LWP 31020) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  14   Thread 0x7fa1f0b5c700 (LWP 31037) 0x00007fa1feff09b3 in select () from /lib64/libc.so.6
  13   Thread 0x7fa1f1b5e700 (LWP 31035) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12   Thread 0x7fa1f235f700 (LWP 31034) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11   Thread 0x7fa1f2b60700 (LWP 31033) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7fa1f3361700 (LWP 31032) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9    Thread 0x7fa1ee357700 (LWP 31042) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7fa1ebb52700 (LWP 31047) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7fa1ed355700 (LWP 31044) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7fa1eeb58700 (LWP 31041) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7fa1f035b700 (LWP 31038) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7fa1f135d700 (LWP 31036) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7fa1f836b700 (LWP 31022) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7fa1f5b66700 (LWP 31027) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7fa2009b0740 (LWP 31018) 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6

(gdb)

大部分都在 wait/timewait 之类的,也没啥毛病。

尝试打印下变量:

(gdb) p req
No symbol "req" in current context.

怎么没有符号表?
切一下frame。

(gdb) frame 29
#29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107

(gdb) p req
$1 = (SVCINFO *) 0x7ffeb1c9e340

可以看到这个变量的定义和值。有人说,这玩意是地址怎么看?
其实有源码就什么都能看得到的。只是这里没有加载进来。
GDB 默认搜索当前目录,但是也没搜索到。
编译的时候是会记录源码位置的,但是因为这个主机上没有,所以看不到。

如果有兴趣玩的话,可以自己写一段把源码放一起,看看效果。

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。