Linux的Zombie进程的成因以及优化方法
在linux系统中,进程(通过系统fork产生的)存在如下的一些状态(摘自ps官方manual手册)
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to
describe the state of a process:
D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped by job control signal
t stopped by debugger during the tracing
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent
For BSD formats and when the stat keyword is used, additional characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for real-time and custom IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
+ is in the foreground process group
这些状态一般通过ps -x可以查看,我们本次探讨的主要是关于Z状态的进程。
我们先以一个例子简单说明一下什么是Zombie状态进程,以C代码为例
pid_t child = 0; if ((child = fork()) < 0) { printf("fork err\n"); return; } else if (child == 0) { printf("child pid : %d,father is %d\n",getpid(),getppid()); _exit(0); } printf("pid is %d\n",getpid()); sleep(10); return 0;
执行一下结果:
此时可以看到子进程先退了,但是是真的完全没有了么,我们再看一下ps的结果:
可以看到4418的进程变成了defunct,也就是我们常说的Zombie进程,再看一下state:
这个是怎么产生的呢,我们回来再看代码,子进程退出前,执行了exit(0),通过查看exit的手册,我们找到了答案
After exit(), the exit status must be transmitted to the parent process. There are three cases. If the parent has set
SA_NOCLDWAIT, or has set the SIGCHLD handler to SIG_IGN, the status is discarded. If the parent was waiting on the
child it is notified of the exit status. In both cases the exiting process dies immediately. If the parent has not
indicated that it is not interested in the exit status, but is not waiting, the exiting process turns into a "zombie"
process (which is nothing but a container for the single byte representing the exit status) so that the parent can learn
the exit status when it later calls one of the wait(2) functions.
这样也就是说,zombie状态实际是给父进程汇报子进程执行结束后的结果的,因此,如果父进程没有明确表示不关心这个结果,子进程会保持该状态,那么接下来就有两种处理方法:
1、通过waitpid的方法,获取子进程的结束status,我们查看waitpid的系统调用可以了解到,最终的release_task是由父进程完成的(参考kernel3.10的源码调用)
2、通过注册SIGCHLD信号处理的方式,明确父进程不关心子进程,这样在子进程退出时,则不会变成zombie状态(linux的wait手册)
A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of informa-
tion about the zombie process (PID, termination status, resource usage information) in order to allow the parent to
later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a
wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create
further processes. If a parent process terminates, then its "zombie" children (if any) are adopted by init(8), which
automatically performs a wait to remove the zombies.
因此,可以看到,子进程的退出状态实际是由父进程来控制的(大概翻了一下代码实际是由进程组的leader的行为来决定的),而最为关键的一步,也是确保系统健康的一步,就是对于父进程退出后,所有的zombie进程都会由1号进程接管,因此,1号进程如果出现故障,或者回收的比较慢,就会造成zombie进程积压,最终引发系统崩溃。
--------------------------------------------------------------------------------
再来看一个python的案例(偷个懒,换个方便的语言,道理差不多,c整起来略麻烦):
>>> import os
>>> a = os.popen("date")
>>>
通过a.close()的方法,可以回收掉这个defunct,python的popen实际是基于popen函数实现,内部会调用fork,这个和pclose成对,通过查看pclose源码,可以看到,这里面会调用waitpid来主动清理popen的Zombie进程。
再来看一下通过信号明确不关心返回的处理方式SIGCHLD
>>> import signal
>>> signal.signal(signal.SIGCHLD,signal.SIG_IGN)
0
>>> b = os.popen("date")
可以很明显看到不会残留一个Z状态的进程
所以,我们在调用fork时(或者基于fork实现的功能时),要在代码中明确考虑SIGCHLD信号或者调用wait来清理子进程的退出状态,来防止Zombie进程的产生;当然,站在整个系统层面来说,尽量减少fork的总次数,也是一个不错的思路。
- 点赞
- 收藏
- 关注作者
评论(0)