如何使用 BPF 分析 openEuler 内存泄漏,Linux 性能调优之 BPF 分析内核态、用户态内存泄漏【华为根技术】

举报
山河已无恙 发表于 2025/06/12 09:31:07 2025/06/12
【摘要】 写在前面博文内容为 通过 BCC 工具集 memleak 进行内存泄漏分析的简单认知包括 memleak 脚本简单认知,内核态(内核模块)、用户态(Java,Python,C) 内存跟踪泄漏分析 Demo理解不足小伙伴帮忙指正 :),生活加油 知其不可奈何而安之若命,德之至也。----《庄子·内篇·人间世》 持续分享技术干货,感兴趣小伙伴可以关注下 ^_^使用 BPF 分析 Linux 内...

写在前面


  • 博文内容为 通过 BCC 工具集 memleak 进行内存泄漏分析的简单认知
  • 包括 memleak 脚本简单认知,内核态(内核模块)、用户态(Java,Python,C) 内存跟踪泄漏分析 Demo
  • 理解不足小伙伴帮忙指正 :),生活加油

知其不可奈何而安之若命,德之至也。----《庄子·内篇·人间世》

持续分享技术干货,感兴趣小伙伴可以关注下 ^_^


使用 BPF 分析 Linux 内存泄漏,这里主要使用 BCC 工具集中的 memleak 工具

下面实验用的 Linux 环境

[root@developer ~]# hostnamectl 
 Static hostname: developer
       Icon name: computer-vm
         Chassis: vm
      Machine ID: 7ad73f2b5f7046a2a389ca780f472467
         Boot ID: cef15819a5c34efa92443b6eff608cc9
  Virtualization: kvm
Operating System: openEuler 22.03 (LTS-SP4)
          Kernel: Linux 5.10.0-250.0.0.154.oe2203sp4.aarch64
    Architecture: arm64
 Hardware Vendor: OpenStack Foundation
  Hardware Model: OpenStack Nova
[root@developer ~]# 

memleak(8)'是一个 BCC 工具,可以用来跟踪内存分配和释放事件对应的调用栈信息。随着时间的推移,这个工具可以显示长期不被释放的内存。

理论上一段时间后还是没有释放的内存,这意味着可能是泄漏的内存。

工具的源码地址:

工具的帮助文档:

EXAMPLES:

./memleak -p $(pidof allocs)
        Trace allocations and display a summary of "leaked" (outstanding)
        allocations every 5 seconds
./memleak -p $(pidof allocs) -t
        Trace allocations and display each individual allocator function call
./memleak -ap $(pidof allocs) 10
        Trace allocations and display allocated addresses, sizes, and stacks
        every 10 seconds for outstanding allocations
./memleak -c "./allocs"
        Run the specified command and trace its allocations
./memleak
        Trace allocations in kernel mode and display a summary of outstanding
        allocations every 5 seconds
./memleak -o 60000
        Trace allocations in kernel mode and display a summary of outstanding
        allocations that are at least one minute (60 seconds) old
./memleak -s 5
        Trace roughly every 5th allocation, to reduce overhead
./memleak --sort count
        Trace allocations in kernel mode and display a summary of outstanding
        allocations that are sorted in count order

这里简单说明一下这个脚本做了什么:

主要用于内核态和用户态的内存跟踪,memleak 命令 当未指定进程 ID (pid = -1) 且未过滤特定命令 (command = None) 时,工具默认跟踪内核内存事件。若指定了 pidcommand,则只跟踪特定用户空间进程的内存行为。

kernel_trace = (pid == -1 and command is None)

对于用户态内存分配 (User-mode Allocations),跟踪 malloc/calloc/realloc/mmap 等内存分配函数,以及对应的 free/munmap 释放函数。

# 定义用户空间分配函数的 uprobe/uretprobe 探针
 def attach_probes(sym, fn_prefix=None, can_fail=False, need_uretprobe=True):
                if fn_prefix is None:
                        fn_prefix = sym
                if args.symbols_prefix is not None:
                        sym = args.symbols_prefix + sym
                try:
                        bpf.attach_uprobe(name=obj, sym=sym,
                                          fn_name=fn_prefix + "_enter",
                                          pid=pid)
                        if need_uretprobe:
                                bpf.attach_uretprobe(name=obj, sym=sym,
                                             fn_name=fn_prefix + "_exit",
                                             pid=pid)
                except Exception:
                        if can_fail:
                                return
                        else:
                                raise

        attach_probes("malloc")
        attach_probes("calloc")
        attach_probes("realloc")
        attach_probes("mmap", can_fail=True) # failed on jemalloc
        attach_probes("posix_memalign")
        attach_probes("valloc", can_fail=True) # failed on Android, is deprecated in libc.so from bionic directory
        attach_probes("memalign")
        attach_probes("pvalloc", can_fail=True) # failed on Android, is deprecated in libc.so from bionic directory
        attach_probes("aligned_alloc", can_fail=True)  # added in C11
        attach_probes("free", need_uretprobe=False)
        attach_probes("munmap", can_fail=True, need_uretprobe=False) # failed on jemalloc

对于​内核态内存分配 (Kernel-mode Allocations),​​通过内核跟踪点(tracepoints)监控 kmalloc/kfree/kmem_cache_alloc 等内核内存分配释放函数,支持物理页分配(如 __get_free_pages

bpf_source_kernel = """

TRACEPOINT_PROBE(kmem, kmalloc) {
        if (WORKAROUND_MISSING_FREE)
            gen_free_enter((struct pt_regs *)args, (void *)args->ptr);
        gen_alloc_enter((struct pt_regs *)args, args->bytes_alloc, KERNEL);
        return gen_alloc_exit2((struct pt_regs *)args, (size_t)args->ptr, KERNEL);
}

TRACEPOINT_PROBE(kmem, kfree) {
        return gen_free_enter((struct pt_regs *)args, (void *)args->ptr);
}

TRACEPOINT_PROBE(kmem, kmem_cache_alloc) {
        if (WORKAROUND_MISSING_FREE)
            gen_free_enter((struct pt_regs *)args, (void *)args->ptr);
        gen_alloc_enter((struct pt_regs *)args, args->bytes_alloc, KERNEL);
        return gen_alloc_exit2((struct pt_regs *)args, (size_t)args->ptr, KERNEL);
}

TRACEPOINT_PROBE(kmem, kmem_cache_free) {
        return gen_free_enter((struct pt_regs *)args, (void *)args->ptr);
}

TRACEPOINT_PROBE(kmem, mm_page_alloc) {
        gen_alloc_enter((struct pt_regs *)args, PAGE_SIZE << args->order, KERNEL);
        return gen_alloc_exit2((struct pt_regs *)args, args->pfn, KERNEL);
}

TRACEPOINT_PROBE(kmem, mm_page_free) {
        return gen_free_enter((struct pt_regs *)args, (void *)args->pfn);
}
"""

泄漏检测:记录未匹配 free 的分配操作,统计 “未释放” 内存的大小、次数和调用栈。

static inline void update_statistics_add(u64 stack_id, u64 sz) {
    struct combined_alloc_info_t *existing_cinfo;
    struct combined_alloc_info_t cinfo = {0, 0};

    // 查找该调用栈是否已有统计记录
    existing_cinfo = combined_allocs.lookup(&stack_id);
    if (!existing_cinfo) {
        // 若不存在,初始化一个新记录(total_size=0, number_of_allocs=0)
        combined_allocs.update(&stack_id, &cinfo);
        existing_cinfo = combined_allocs.lookup(&stack_id);
        if (!existing_cinfo)  // 二次检查,确保插入成功
            return;
    }
    
    // 原子操作:增加总大小和分配次数
    __sync_fetch_and_add(&existing_cinfo->total_size, sz);
    __sync_fetch_and_add(&existing_cinfo->number_of_allocs, 1);
}
static inline void update_statistics_del(u64 stack_id, u64 sz) {
    struct combined_alloc_info_t *existing_cinfo;

    // 查找该调用栈的统计记录
    existing_cinfo = combined_allocs.lookup(&stack_id);
    if (!existing_cinfo)
        return;

    // 若该调用栈只有一次分配,直接删除记录
    if (existing_cinfo->number_of_allocs > 1) {
        // 原子操作:减少总大小和分配次数
        __sync_fetch_and_sub(&existing_cinfo->total_size, sz);
        __sync_fetch_and_sub(&existing_cinfo->number_of_allocs, 1);
    } else {
        // 若只剩一次分配,删除整个记录以节省空间
        combined_allocs.delete(&stack_id);
    }
}

过滤与采样:支持按内存大小(–min-size/–max-size)、采样率(-s)过滤事件,减少性能开销

下面我们通过几个 Demo 来演示如何进行内核态和用户态的内存泄漏跟踪,下面的 Demo使用的最新版本的工具,实际上如果有特殊需求,可以定制化开发,感兴趣小伙伴可以尝试,欢迎留言讨论

内核态内存泄漏分析

这里我们通过一个内核模块来模拟内存泄漏的问题,memory_leak是一个​​故意制造内存泄漏的模块​​,通过定时器周期性分配内核内存但不释放,模拟内核态内存泄漏场景。

┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$cat Makefile
obj-m += memory_leak.o
CFLAGS_memory_leak.o += -g  # 保留调试符号
all:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$
┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$ls
Makefile        memory_leak.c   memory_leak.mod    memory_leak.mod.o  modules.order
Module.symvers  memory_leak.ko  memory_leak.mod.c  memory_leak.o

下面为模块对应的代码

​​初始化阶段(memory_leak_init)​​,设置定时器 leak_timer,绑定回调函数 leak_timer_callback。首次触发时间设为 ​​1秒后​​(msecs_to_jiffies(1000)),打印加载日志:Memory leak module loaded

​定时器回调(leak_timer_callback)​​:每次触发分配 ​​1MB 内存, ​​内存泄漏核心操作​​:char *ptr = kmalloc(1024 * 1024, GFP_KERNEL); // 分配 1MB 内存

┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$cat memory_leak.c
#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>      // 新增:包含kmalloc所需头文件
#include <linux/timer.h>     // 新增:包含定时器所需头文件

static struct timer_list leak_timer;
static int leak_count = 0;

static void leak_timer_callback(struct timer_list *t) {
    char *ptr = kmalloc(1024* 1024, GFP_KERNEL);  // 分配1MB内存
    if (ptr) {
        printk(KERN_INFO "Leaked memory %d at %p\n", leak_count++, ptr);
    }
    mod_timer(&leak_timer, jiffies + msecs_to_jiffies(10)); // 1秒后再次触发
}

static int __init memory_leak_init(void) {
    timer_setup(&leak_timer, leak_timer_callback, 0);
    mod_timer(&leak_timer, jiffies + msecs_to_jiffies(1000)); // 1秒后首次触发
    printk(KERN_INFO "Memory leak module loaded\n");
    return 0;
}

static void __exit memory_leak_exit(void) {
    del_timer_sync(&leak_timer);
    printk(KERN_INFO "Memory leak module unloaded (but memory not freed!)\n");
}

module_init(memory_leak_init);
module_exit(memory_leak_exit);

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Memory Leak Demo with Timer");

编译之后加载内核模块即可,这是一个测试模块,只能在测试环境跑,加载模块之后,观测内核日志

┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$make
┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$dmesg  -C
┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$insmod memory_leak.ko
┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$dmesg  -T --follow

下面为通过 BCC 工具 memleak 跟踪的内存分配日志,这里参数的意思,--top 5 以大小排序展示前五条, --min-size 1048000 过滤大于这个大小的内存申请。

┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools]
└─$./memleak  --top 5 --min-size 1048000
Attaching to kernel allocators, Ctrl+C to quit.
[19:33:40] Top 5 stacks with outstanding allocations:
        1048576 bytes in 1 allocations from stack
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a169059      __kmalloc_large_node+0x79 [kernel]
                0xffffffff9a1695d9      kmalloc_large+0x19 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
                0xffffffff9aa978d7      __do_softirq+0xc7 [kernel]
                0xffffffff99f10df1      __irq_exit_rcu+0xa1 [kernel]
                0xffffffff9aa81a93      common_interrupt+0x43 [kernel]
                0xffffffff9ac00d62      asm_common_interrupt+0x22 [kernel]
        4194304 bytes in 4 allocations from stack
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
                0xffffffff9aa978d7      __do_softirq+0xc7 [kernel]
                0xffffffff99f10df1      __irq_exit_rcu+0xa1 [kernel]
                0xffffffff9aa81ad0      common_interrupt+0x80 [kernel]
                0xffffffff9ac00d62      asm_common_interrupt+0x22 [kernel]
                0xffffffff9aa86850      acpi_safe_halt+0x20 [kernel]
                0xffffffff9aa8689f      acpi_idle_do_entry+0x2f [kernel]
                0xffffffff9aa86bab      acpi_idle_enter+0x7b [kernel]
                0xffffffff9aa85911      cpuidle_enter_state+0x81 [kernel]
                0xffffffff9a764639      cpuidle_enter+0x29 [kernel]
                0xffffffff99f6c03a      cpuidle_idle_call+0xfa [kernel]
                0xffffffff99f6c12b      do_idle+0x7b [kernel]
                0xffffffff99f6c379      cpu_startup_entry+0x19 [kernel]
                0xffffffff9aa86e0a      rest_init+0xca [kernel]
                0xffffffff9c48f766      arch_call_rest_init+0xa [kernel]
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
        4194304 bytes in 4 allocations from stack
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a169059      __kmalloc_large_node+0x79 [kernel]
                0xffffffff9a1695d9      kmalloc_large+0x19 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
               。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
        454033408 bytes in 433 allocations from stack
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a169059      __kmalloc_large_node+0x79 [kernel]
                0xffffffff9a1695d9      kmalloc_large+0x19 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
                0xffffffff9aa978d7      __do_softirq+0xc7 [kernel]
                。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
        455081984 bytes in 434 allocations from stack
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
                0xffffffff9aa978d7      __do_softirq+0xc7 [kernel]
                0xffffffff99f10df1      __irq_exit_rcu+0xa1 [kernel]
               。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
[19:33:46] Top 5 stacks with outstanding allocations:
        5242880 bytes in 5 allocations from stack
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
                0xffffffff9aa978d7      __do_softirq+0xc7 [kernel]
                0xffffffff99f10df1      __irq_exit_rcu+0xa1 [kernel]
                0xffffffff9aa81ad0      common_interrupt+0x80 [kernel]
               。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
        5242880 bytes in 5 allocations from stack
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a169059      __kmalloc_large_node+0x79 [kernel]
                0xffffffff9a1695d9      kmalloc_large+0x19 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
                0xffffffff9aa978d7      __do_softirq+0xc7 [kernel]
                0xffffffff99f10df1      __irq_exit_rcu+0xa1 [kernel]
              。。。。。。。。。。。。。。。
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
        16777216 bytes in 8 allocations from stack
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1abb47      __folio_alloc+0x17 [kernel]
                0xffffffff9a1d0401      vma_alloc_folio+0x281 [kernel]
                0xffffffff9a1f1006      do_huge_pmd_anonymous_page+0xb6 [kernel]
                0xffffffff9a1843c1      __handle_mm_fault+0x661 [kernel]
                0xffffffff9a1844ad      handle_mm_fault+0xcd [kernel]
                0xffffffff99e8ac94      do_user_addr_fault+0x1b4 [kernel]
                0xffffffff9aa84ab2      exc_page_fault+0x62 [kernel]
                0xffffffff9ac00bc2      asm_exc_page_fault+0x22 [kernel]
        929038336 bytes in 886 allocations from stack
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a169059      __kmalloc_large_node+0x79 [kernel]
                0xffffffff9a1695d9      kmalloc_large+0x19 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
              。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
        931135488 bytes in 888 allocations from stack
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffff9a169648      kmalloc_large+0x88 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]
                0xffffffff99fc528e      __run_timers.part.0+0x1ee [kernel]
                0xffffffff99fc5356      run_timer_softirq+0x26 [kernel]
             。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
                0xffffffff9c48f766      arch_call_rest_init+0xa [kernel]
                0xffffffff9c48fc67      start_kernel+0x4a3 [kernel]
                0xffffffff99e00159      secondary_startup_64_no_verify+0xe4 [kernel]
^C┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools]
└─$

上面的输出中反复出现关键调用栈 leak_timer_callback,即为内存漏点:

0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]

这里的 [memory_leak] 可以确定是那个内核模块造成的,下面为完成的调用栈

                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a1ab368      __alloc_pages+0x188 [kernel]
                0xffffffff9a169059      __kmalloc_large_node+0x79 [kernel]
                0xffffffff9a1695d9      kmalloc_large+0x19 [kernel]
                0xffffffffc0cd7024      leak_timer_callback+0x14 [memory_leak]
                0xffffffff99fc4f84      call_timer_fn+0x24 [kernel]

直接对应上面的 memory_leak.c 内核模块代码中的泄漏函数 static void leak_timer_callback(struct timer_list *t)。每次定时器触发(每10ms)会通过 kmalloc(1024 * 1024, GFP_KERNEL) 分配 1MB 内存但从不释放

首次检测(19:33:40)​​:

  • 最大单次泄漏:455 MB(434次分配)

  • 总泄漏量:约 866 MB(5个堆栈总和)
    ​​
    6秒后(19:33:46)​​:

  • 最大单次泄漏:931 MB(888次分配)

  • ​泄漏速率​​:约 75 MB/秒(6秒内增长 455 MB),符合代码设计的理论泄漏速率:100 MB/秒(10ms分配1MB)

泄漏内存通过内核页分配器实现​​,可以验证我们前面的博文,大内存内核会直接调用页分配器分配连续物理页

__alloc_pages+0x188 [kernel]       # 分配物理页
kmalloc_large+0x19 [kernel]        # 大内存分配(>8KB)

测试完成切记需要卸载模块,要不一会就 OOM 了,dmesg 中可以看到 memory_leak 内核模块申请内存的日志

┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$ rmmod memory_leak
┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$dmesg  -T  | head -10
[Tue Jun  3 19:32:24 2025] Memory leak module loaded
[Tue Jun  3 19:32:25 2025] Leaked memory 0 at 000000002eeb6860
[Tue Jun  3 19:32:25 2025] Leaked memory 1 at 0000000094c9050b
[Tue Jun  3 19:32:25 2025] Leaked memory 2 at 00000000733807b5
[Tue Jun  3 19:32:25 2025] Leaked memory 3 at 0000000002beb3fc
[Tue Jun  3 19:32:25 2025] Leaked memory 4 at 0000000003eb80f2
[Tue Jun  3 19:32:25 2025] Leaked memory 5 at 00000000ab0f3c38
[Tue Jun  3 19:32:25 2025] Leaked memory 6 at 0000000061366508
[Tue Jun  3 19:32:25 2025] Leaked memory 7 at 0000000093260330
[Tue Jun  3 19:32:25 2025] Leaked memory 8 at 00000000f4913b97
┌──[root@liruilongs.github.io]-[~/kleak_demo]
└─$

可以通过 free 命令直观的看到内存快速消耗,通过 free 列可以看到,空闲内存以 1s 一个 G 的速度减少,但是因为分配的是虚拟内存,没有映射实际物理内存,所以 used 没有明显变化

┌──[root@liruilongs.github.io]-[~]
└─$free -s 1 -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi        11Gi       3.2Gi        14Mi       485Mi       3.4Gi
Swap:          2.0Gi          0B       2.0Gi

               total        used        free      shared  buff/cache   available
Mem:            15Gi        12Gi       3.1Gi        14Mi       485Mi       3.3Gi
Swap:          2.0Gi          0B       2.0Gi

               total        used        free      shared  buff/cache   available
Mem:            15Gi        12Gi       3.0Gi        14Mi       485Mi       3.2Gi
Swap:          2.0Gi          0B       2.0Gi

               total        used        free      shared  buff/cache   available
Mem:            15Gi        12Gi       2.9Gi        14Mi       485Mi       3.1Gi
Swap:          2.0Gi          0B       2.0Gi

               total        used        free      shared  buff/cache   available
Mem:            15Gi        12Gi       2.8Gi        14Mi       485Mi       3.0Gi
Swap:          2.0Gi          0B       2.0Gi

               total        used        free      shared  buff/cache   available
Mem:            15Gi        12Gi       2.7Gi        14Mi       485Mi       2.9Gi
Swap:          2.0Gi          0B       2.0Gi

               total        used        free      shared  buff/cache   available
Mem:            15Gi        12Gi       2.6Gi        14Mi       486Mi       2.8Gi
Swap:          2.0Gi          0B       2.0Gi

当然这里只是一个Demo,用于演示,实际的内存泄漏往往要复杂的多,需要通过调用栈结合代码分析

用户态内存泄漏分析

java 内存泄漏分析

堆外内存

使用的 JDK 版本

[developer@developer ~]$ java --show-version
openjdk 17.0.13 2024-10-15
OpenJDK Runtime Environment BiSheng (build 17.0.13+11)
OpenJDK 64-Bit Server VM BiSheng (build 17.0.13+11, mixed mode, sharing)

测试用的 Demo 典型的Java堆外内存泄漏演示程序,主要功能是​​持续分配堆外内存且不释放,最终导致系统内存耗尽​​

  • 通过 ByteBuffer.allocateDirect()申请的是​​堆外内存(Direct Memory)​​。
  • 堆外内存不受JVM GC 管理,需手动释放或依赖Cleaner机制

下面是 Demo 代码

[root@liruilongs.github.io ~]# cat memLeakDemo.java 
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.List;

public class MemoryLeakJava {
    private static final List<Object> leakList = new ArrayList<>();

    public static void main(String[] args) throws InterruptedException {
        while (true) {
            // 本地内存分配
            ByteBuffer directBuffer = ByteBuffer.allocateDirect(10*1024 * 1024);
            leakList.add(directBuffer);
            
            Thread.sleep(1000);
        }
    }
}
[root@liruilongs.github.io ~]# 

run 上面的Demo

java memLeakDemo.java  

然后通过 BCC 工具进行分析跟踪

[root@liruilongs.github.io tools]# ./memleak.py -p $(pgrep java)    -s 3 -a 10  -o 20000
Attaching to pid 1722308, Ctrl+C to quit.
[16:44:22] Top 10 stacks with outstanding allocations:
[16:44:32] Top 10 stacks with outstanding allocations:

参数说明:

  • -p $(pgrep java) 指定跟踪的进程ID
  • -s 每 3 次分配采样一次
  • -a 每隔 10 秒输出未释放内存的地址、大小及完整调用堆栈
  • -o 参数表示仅显示存活超过 20000 毫秒(20 s)的未释放内存,排除短期临时分配

上面的输出数据我们可以看到,前两次输出 20 s内,所以没有内存分配释放以及对应的调用栈信息,下面为第三次的输出,记录了在分配内存20s后任然存在的内存块分配地址以及大小

总共 6 次内存分配,两次主要的分配分别为 20MB 和 40MB,总计 60MB 未释放。每次分配大小为 10MB 左右(10485760 字节 ≈ 10MB)。这里分配的内存对应我们上面分配的堆外内存

除了内存块数据外,memleak 还记录了分配的调用栈,可以通过调用栈直接定位具体的调用方法

[16:44:42] Top 10 stacks with outstanding allocations:
 addr = ffff0dbf6010 size = 10485760
 addr = ffff0a9f1010 size = 10485760
 addr = ffff0f9f9010 size = 10485760
 addr = ffff0b3f2010 size = 10485760
 addr = ffff0b3f2000 size = 10489856
 addr = ffff0f9f9000 size = 10489856
 20979712 bytes in 2 allocations from stack # 对应后两次分配
  0x0000ffff96f2f730 [unknown] [libc.so.6]
  0x0000ffff96f303b4 [unknown] [libc.so.6]
  0x0000ffff96f31598 [unknown] [libc.so.6]
  0x0000ffff96f31e84 malloc+0xa4 [libc.so.6]
  0x0000fffffffff000 [unknown] [[uprobes]]
  0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so]
  0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so]
  0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6330 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so]
  0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so]
  0x0000ffff9705434c JavaMain+0xc5c [libjli.so]
  0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so]
  0x0000ffff96f22518 [unknown] [libc.so.6]
  0x0000ffff96f89d5c [unknown] [libc.so.6]
 41943040 bytes in 4 allocations from stack  # 对应前四次分配
  0x0000ffff966cf658 os::malloc(unsigned long, MEMFLAGS)+0xb8 [libjvm.so]
  0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so]
  0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so]
  0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6330 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so]
  0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so]
  0x0000ffff9705434c JavaMain+0xc5c [libjli.so]
  0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so]
  0x0000ffff96f22518 [unknown] [libc.so.6]
  0x0000ffff96f89d5c [unknown] [libc.so.6]

这里我们简单分析一下,以10485760 大小的内存块分配为例,上面最后一个调用栈输出,最上面是栈底,可以看到直接的调用的 libjvm 这个库(这是JVM的 核心实现),内部调用 glibcmalloc 函数分配内存

├─ 本地内存分配(关键泄漏点)
│   └─ Unsafe_AllocateMemory0 (libjvm.so)  // 对应 Java 的 Unsafe.allocateMemory()
│       └─ os::malloc (libjvm.so)  // JVM 封装的 malloc 调用

通过这个调用栈我们可以直接定位代码中的对应的内存分配的函数,也就是内存泄漏的地方

ByteBuffer directBuffer = ByteBuffer.allocateDirect(10*1024*1024);

这里是如何确定的,DirectByteBuffer 的构造会触发 Unsafe.allocateMemoryJVM 通过 os::malloc 分配内存,并将内存地址与 DirectByteBuffer 对象关联。

下面是JDK源码的一部分

    DirectByteBuffer(int cap) {                   // package-private

        super(-1, 0, cap, cap);
        boolean pa = VM.isDirectMemoryPageAligned();
        int ps = Bits.pageSize();
        long size = Math.max(1L, (long)cap + (pa ? ps : 0));
        Bits.reserveMemory(size, cap);
        long base = 0;
        try {
            base = unsafe.allocateMemory(size);
        } catch (OutOfMemoryError x) {
            Bits.unreserveMemory(size, cap);
            throw x;
        }
        unsafe.setMemory(base, size, (byte) 0);
        if (pa && (base % ps != 0)) {
            // Round up to page boundary
            address = base + ps - (base & (ps - 1));
        } else {
            address = base;
        }
        cleaner = Cleaner.create(this, new Deallocator(base, size, cap));
        att = null;
  ...................

这里有个问题,可能细心的小伙伴发现了,虽然都是 10M 左右,但是还是不一样的,前两次直接调用 libc 库的 malloc 函数分配

20979712 bytes in 2 allocations from stack # 对应两次分配
  0x0000ffff96f2f730 [unknown] [libc.so.6]
  0x0000ffff96f303b4 [unknown] [libc.so.6]
  0x0000ffff96f31598 [unknown] [libc.so.6]
  0x0000ffff96f31e84 malloc+0xa4 [libc.so.6]
  0x0000fffffffff000 [unknown] [[uprobes]]
  0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]

后面的四次由 JVM 封装的 os::malloc 分配。

41943040 bytes in 4 allocations from stack  # 对应四次分配
  0x0000ffff966cf658 os::malloc(unsigned long, MEMFLAGS)+0xb8 [libjvm.so]
  0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]

为什么会这样,我找了好多资料没有找到原因,欢迎小伙伴留言讨论 ^_^,我猜可能的原因,实际上只分配了一次,只是触发了两次埋点,但是解释不通为什么数量不一样。

下面为 10s 后的第四次输出

[16:44:52] Top 10 stacks with outstanding allocations:
 addr = ffff0dbf6010 size = 10485760
 addr = ffff063ea010 size = 10485760
 addr = ffff081ed010 size = 10485760
 addr = ffff04fe8010 size = 10485760
 addr = ffff06deb010 size = 10485760
 addr = ffff0a9f1010 size = 10485760
 addr = ffff08bee010 size = 10485760
 addr = ffff0f9f9010 size = 10485760
 addr = ffff095ef010 size = 10485760
 addr = ffff0b3f2010 size = 10485760
 addr = ffff081ed000 size = 10489856
 addr = ffff077ec000 size = 10489856
 addr = ffff0b3f2000 size = 10489856
 addr = ffff0f9f9000 size = 10489856
 addr = ffff04fe8000 size = 10489856
 addr = ffff08bee000 size = 10489856
 62939136 bytes in 6 allocations from stack
  0x0000ffff96f2f730 [unknown] [libc.so.6]
  0x0000ffff96f303b4 [unknown] [libc.so.6]
  0x0000ffff96f31598 [unknown] [libc.so.6]
  0x0000ffff96f31e84 malloc+0xa4 [libc.so.6]
  0x0000fffffffff000 [unknown] [[uprobes]]
  0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so]
  0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so]
  0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6330 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so]
  0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so]
  0x0000ffff9705434c JavaMain+0xc5c [libjli.so]
  0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so]
  0x0000ffff96f22518 [unknown] [libc.so.6]
  0x0000ffff96f89d5c [unknown] [libc.so.6]
 104857600 bytes in 10 allocations from stack
  0x0000ffff966cf658 os::malloc(unsigned long, MEMFLAGS)+0xb8 [libjvm.so]
  0x0000ffff9695d798 Unsafe_AllocateMemory0+0x74 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef5f04 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff96767a30 invoke(InstanceKlass*, methodHandle const&, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, JavaThread*) [clone .constprop.0]+0x4fc [libjvm.so]
  0x0000ffff967687e8 Reflection::invoke_method(oopDesc*, Handle, objArrayHandle, JavaThread*)+0x148 [libjvm.so]
  0x0000ffff963ed5b4 JVM_InvokeMethod+0x124 [libjvm.so]
  0x0000ffff78ef9b38 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6330 [unknown]
  0x0000ffff78ef5db8 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef6018 [unknown]
  0x0000ffff78ef0140 [unknown]
  0x0000ffff963228e8 JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x298 [libjvm.so]
  0x0000ffff963b3da4 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x1c4 [libjvm.so]
  0x0000ffff963b6878 jni_CallStaticVoidMethod+0x118 [libjvm.so]
  0x0000ffff9705434c JavaMain+0xc5c [libjli.so]
  0x0000ffff9705743c ThreadJavaMain+0xc [libjli.so]
  0x0000ffff96f22518 [unknown] [libc.so.6]
  0x0000ffff96f89d5c [unknown] [libc.so.6]

可以看到调用栈数据保持一致,包括分配的内存块大小,验证了我们上面找到的内存漏点。

实际上生产中大多数常见内存事故,我们首先需要确定是否存在内存泄漏,然后在寻找漏点,这里还有个更简单的方法,直接比对每次采集的内存块地址,如果在多个时间区间内,一直增加,释放的特别少,那么可能存在内存泄漏

下面为1分钟之内驻留的内存块地址,我们可以通过简单的比对,直接判断是否存在内存泄漏,但是这里的比对太多明显

[16:44:42] Top 10 stacks with outstanding allocations:
 addr = ffff0dbf6010 size = 10485760
 addr = ffff0a9f1010 size = 10485760
 addr = ffff0f9f9010 size = 10485760
 addr = ffff0b3f2010 size = 10485760
 addr = ffff0b3f2000 size = 10489856
 addr = ffff0f9f9000 size = 10489856
..............
[16:45:42] Top 10 stacks with outstanding allocations:
 addr = ffff34156aa0 size = 40
 addr = ffff031e5010 size = 10485760
 addr = fffee79b9010 size = 10485760
 addr = fffef37cc010 size = 10485760
 addr = fffefa5d7010 size = 10485760
 addr = ffff0dbf6010 size = 10485760
 addr = ffff063ea010 size = 10485760
 addr = ffff081ed010 size = 10485760
 addr = ffff04fe8010 size = 10485760
 addr = ffff06deb010 size = 10485760
 addr = fffefe1dd010 size = 10485760
 addr = fffeff5df010 size = 10485760
 addr = fffee51b5010 size = 10485760
 addr = ffff009e1010 size = 10485760
 addr = fffef5fd0010 size = 10485760
 addr = fffee65b7010 size = 10485760
 addr = fffefffe0010 size = 10485760
 addr = ffff0a9f1010 size = 10485760
 addr = fffeef1c5010 size = 10485760
 addr = ffff08bee010 size = 10485760
 addr = ffff0f9f9010 size = 10485760
 addr = fffefb9d9010 size = 10485760
 addr = ffff095ef010 size = 10485760
 addr = ffff0b3f2010 size = 10485760
 addr = fffeefbc6010 size = 10485760
 addr = ffff027e4010 size = 10485760
 addr = fffef4bce010 size = 10485760
 addr = fffeee7c4010 size = 10485760
 addr = fffef87d4010 size = 10485760
 addr = fffefffe0000 size = 10489856
 addr = fffeebfc0000 size = 10489856
 addr = fffef7dd3000 size = 10489856
 addr = ffff081ed000 size = 10489856
 addr = fffeeb5bf000 size = 10489856
 addr = ffff077ec000 size = 10489856
 addr = ffff0b3f2000 size = 10489856
 addr = fffee65b7000 size = 10489856
 addr = ffff0f9f9000 size = 10489856
 addr = fffef19c9000 size = 10489856
 addr = fffee97bc000 size = 10489856
 addr = fffef69d1000 size = 10489856
 addr = fffee79b9000 size = 10489856
 addr = ffff009e1000 size = 10489856
 addr = fffeee7c4000 size = 10489856
 addr = fffef55cf000 size = 10489856
 addr = fffefafd8000 size = 10489856
 addr = ffff04fe8000 size = 10489856
 addr = fffeefbc6000 size = 10489856
 addr = fffeed3c2000 size = 10489856
 addr = ffff08bee000 size = 10489856
 .............

红色的部分全部为增加的内存块,之前开始分配的内存一个都没有释放

在这里插入图片描述

堆内内存

下面我们来看另一种情况,对应堆内存的跟踪,上面的Demo分配的内存并不归java 堆管理,简单修改使内存分配从本地内存转移到了Java 堆内存。

[root@liruilongs.github.io ~]# cat memLeakDemo.java 
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class MemoryLeakJava {
    private static final List<Object> leakList = new ArrayList<>();
    // 用于生成随机数据的随机数生成器
    private static final Random random = new Random();
     /**
     * 表示大数据对象的类
     */
    static class DataObject {
        // 大型字节数组,消耗内存
        private byte[] data;
        /**
         * 创建一个包含指定大小随机数据的对象
         *
         * @param sizeInBytes 数据大小(字节)
         */
        public DataObject(int sizeInBytes) {
            // 创建指定大小的字节数组
            this.data = new byte[sizeInBytes];

            // 用随机数据填充数组(使其不容易被JVM优化)
            random.nextBytes(this.data);
        }
    }

    public static void main(String[] args) throws InterruptedException {
        while (true) {
            // 本地内存分配(更容易被 BCC 跟踪)
            //ByteBuffer directBuffer = ByteBuffer.allocateDirect(9*1024 * 1024);
            DataObject data = new DataObject(10 *  1024 * 1024);
     leakList.add(data);
        Thread.sleep(1000);
        }
    }
}
[root@liruilongs.github.io ~]# 

每次创建的对象 DataObject 为 10M,JVM 堆中,超过 10M 的对象会直接进入到老年代,下面为使用相同的命令进行的内存分配跟踪。

[root@liruilongs.github.io tools]# ./memleak.py -p $(pgrep java) --top 3   -s 3 -a 10  -o 20000
Attaching to pid 1743032, Ctrl+C to quit.
[19:29:29] Top 3 stacks with outstanding allocations:
[19:29:39] Top 3 stacks with outstanding allocations:
[19:29:49] Top 3 stacks with outstanding allocations:
 addr = ffff4041d4f0 size = 26
 addr = ffff40401ed0 size = 26
 addr = ffff403ec3d0 size = 26
 addr = ffff40402490 size = 26
 addr = ffff404189c0 size = 26
..........................................
 addr = ffff6455e000 size = 4096
 addr = ffff648a4000 size = 4096
 addr = ffff648b4000 size = 4096
 addr = ffff64581000 size = 4096
 addr = ffff64221000 size = 4096
 addr = ffff648bd000 size = 4096
 addr = ffff648c1000 size = 4096
 addr = ffff648a7000 size = 4096
 addr = ffff6237b000 size = 294912
 addr = ffff6249f000 size = 311296
 addr = fd700000 size = 18874368
 addr = fa000000 size = 18874368
 86016 bytes in 21 allocations from stack
  0x0000ffff80f38534 os::pd_commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x44 [libjvm.so]
  0x0000ffff80f31edc os::commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x38 [libjvm.so]
  0x0000ffff80a80608 G1PageBasedVirtualSpace::commit(unsigned long, unsigned long)+0x178 [libjvm.so]
  0x0000ffff80a955bc G1RegionsSmallerThanCommitSizeMapper::commit_regions(unsigned int, unsigned long, WorkGang*)+0x9c [libjvm.so]
  0x0000ffff80b32cd4 HeapRegionManager::commit_regions(unsigned int, unsigned long, WorkGang*)+0xb4 [libjvm.so]
  0x0000ffff80b34b94 HeapRegionManager::expand(unsigned int, unsigned int, WorkGang*)+0x34 [libjvm.so]
  0x0000ffff80b34da0 HeapRegionManager::expand_by(unsigned int, WorkGang*)+0x70 [libjvm.so]
  0x0000ffff80a378bc G1CollectedHeap::expand(unsigned long, WorkGang*, double*)+0x10c [libjvm.so]
  0x0000ffff80a3a788 G1CollectedHeap::resize_heap_if_necessary()+0x58 [libjvm.so]
  0x0000ffff80a48090 G1ConcurrentMark::remark()+0x3a0 [libjvm.so]
  0x0000ffff80abad54 VM_G1PauseConcurrent::doit()+0x164 [libjvm.so]
  0x0000ffff8120fe58 VM_Operation::evaluate()+0xe8 [libjvm.so]
  0x0000ffff81211a3c VMThread::evaluate_operation(VM_Operation*)+0xfc [libjvm.so]
  0x0000ffff81211f88 VMThread::inner_execute(VM_Operation*)+0x1f8 [libjvm.so]
  0x0000ffff812122f8 VMThread::run()+0xd4 [libjvm.so]
  0x0000ffff81193434 Thread::call_run()+0xc4 [libjvm.so]
  0x0000ffff80f3b7ac thread_native_entry(Thread*)+0xdc [libjvm.so]
  0x0000ffff81782518 [unknown] [libc.so.6]
  0x0000ffff817e9d5c [unknown] [libc.so.6]
 606208 bytes in 2 allocations from stack
  0x0000ffff80f38534 os::pd_commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x44 [libjvm.so]
  0x0000ffff80f31edc os::commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x38 [libjvm.so]
  0x0000ffff80a80608 G1PageBasedVirtualSpace::commit(unsigned long, unsigned long)+0x178 [libjvm.so]
  0x0000ffff80a952b4 G1RegionsLargerThanCommitSizeMapper::commit_regions(unsigned int, unsigned long, WorkGang*)+0x194 [libjvm.so]
  0x0000ffff80b32cb8 HeapRegionManager::commit_regions(unsigned int, unsigned long, WorkGang*)+0x98 [libjvm.so]
  0x0000ffff80b34b94 HeapRegionManager::expand(unsigned int, unsigned int, WorkGang*)+0x34 [libjvm.so]
  0x0000ffff80b34da0 HeapRegionManager::expand_by(unsigned int, WorkGang*)+0x70 [libjvm.so]
  0x0000ffff80a378bc G1CollectedHeap::expand(unsigned long, WorkGang*, double*)+0x10c [libjvm.so]
  0x0000ffff80a3a788 G1CollectedHeap::resize_heap_if_necessary()+0x58 [libjvm.so]
  0x0000ffff80a48090 G1ConcurrentMark::remark()+0x3a0 [libjvm.so]
  0x0000ffff80abad54 VM_G1PauseConcurrent::doit()+0x164 [libjvm.so]
  0x0000ffff8120fe58 VM_Operation::evaluate()+0xe8 [libjvm.so]
  0x0000ffff81211a3c VMThread::evaluate_operation(VM_Operation*)+0xfc [libjvm.so]
  0x0000ffff81211f88 VMThread::inner_execute(VM_Operation*)+0x1f8 [libjvm.so]
  0x0000ffff812122f8 VMThread::run()+0xd4 [libjvm.so]
  0x0000ffff81193434 Thread::call_run()+0xc4 [libjvm.so]
  0x0000ffff80f3b7ac thread_native_entry(Thread*)+0xdc [libjvm.so]
  0x0000ffff81782518 [unknown] [libc.so.6]
  0x0000ffff817e9d5c [unknown] [libc.so.6]
 37748736 bytes in 2 allocations from stack
  0x0000ffff80f38534 os::pd_commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x44 [libjvm.so]
  0x0000ffff80f31edc os::commit_memory_or_exit(char*, unsigned long, unsigned long, bool, char const*)+0x38 [libjvm.so]
  0x0000ffff80a80608 G1PageBasedVirtualSpace::commit(unsigned long, unsigned long)+0x178 [libjvm.so]
  0x0000ffff80a952b4 G1RegionsLargerThanCommitSizeMapper::commit_regions(unsigned int, unsigned long, WorkGang*)+0x194 [libjvm.so]
  0x0000ffff80b32c80 HeapRegionManager::commit_regions(unsigned int, unsigned long, WorkGang*)+0x60 [libjvm.so]
  0x0000ffff80b34b94 HeapRegionManager::expand(unsigned int, unsigned int, WorkGang*)+0x34 [libjvm.so]
  0x0000ffff80b34da0 HeapRegionManager::expand_by(unsigned int, WorkGang*)+0x70 [libjvm.so]
  0x0000ffff80a378bc G1CollectedHeap::expand(unsigned long, WorkGang*, double*)+0x10c [libjvm.so]
  0x0000ffff80a3a788 G1CollectedHeap::resize_heap_if_necessary()+0x58 [libjvm.so]
  0x0000ffff80a48090 G1ConcurrentMark::remark()+0x3a0 [libjvm.so]
  0x0000ffff80abad54 VM_G1PauseConcurrent::doit()+0x164 [libjvm.so]
  0x0000ffff8120fe58 VM_Operation::evaluate()+0xe8 [libjvm.so]
  0x0000ffff81211a3c VMThread::evaluate_operation(VM_Operation*)+0xfc [libjvm.so]
  0x0000ffff81211f88 VMThread::inner_execute(VM_Operation*)+0x1f8 [libjvm.so]
  0x0000ffff812122f8 VMThread::run()+0xd4 [libjvm.so]
  0x0000ffff81193434 Thread::call_run()+0xc4 [libjvm.so]
  0x0000ffff80f3b7ac thread_native_entry(Thread*)+0xdc [libjvm.so]
  0x0000ffff81782518 [unknown] [libc.so.6]
  0x0000ffff817e9d5c [unknown] [libc.so.6]
[19:29:59] Top 3 stacks with outstanding allocations:
 addr = ffff4041d4f0 size = 26
 addr = ffff40401ed0 size = 26
 addr = ffff403ec3d0 size = 26
 addr = ffff40402490 size = 26
 addr = ffff404189c0 size = 26
........................

通过静态列表leakList强引用所有 data 实例,导致堆内存无法回收,触发 G1 GC 不断扩展堆空间。我们可以通过上面的调用栈来获取这些信息

G1CollectedHeap::expand → HeapRegionManager::commit_regions → os::commit_memory

但是对应 Java 堆内部的内存情况,无从得知

所以对于堆内内存通过BCC 工具 memleak 无法实现内存分配跟踪,需要使用 Java 生态自己的一些工具 JVisualVM、jstat、jmap、jconsole、jprofile 等。

下面为 VisualVM 中 GC可视化插件,一个典型的内存泄漏的 GC监控数据

感兴趣小伙伴可以看我之前的博文:

Python 内存泄漏分析

下面是一个 Python 的内存泄漏Demo

[root@liruilongs.github.io ~]# cat leak_memory.py
# memory_leak_python.py
import time
import os
import psutil

# 全局列表,用于持有对象引用(导致内存泄漏)
leaked_objects = []

def leak_memory(step=1024 * 1024):
    """持续分配内存并保留引用"""
    while True:
        # 分配大块内存(例如 1MB 字节数组)
        large_data = bytearray(step)
        # 添加到全局列表,阻止垃圾回收
        leaked_objects.append(large_data)
        time.sleep(1)  # 每秒分配一次

if __name__ == "__main__":
    print(f"Python 进程 PID: {os.getpid()}")
    leak_memory()
[root@liruilongs.github.io ~]# 

通过同样的命令监控, 可以看到 libc.so.6 有些函数都是 unknown, 还有一些python 解释器的库libpython3.9.so.1.0, 即 python 也无法直接通过 memleak 来定位内存分配释放函数, 但是因为我们的Demo很简单,所以可以直观的看到,所有泄漏内存均来自 libpython3.9.so.1.0 中的 PyByteArray_Resize 函数调用栈

[root@liruilongs.github.io tools]# ./memleak.py -p 1757033  -s 3 -a 10  -o 20000
Attaching to pid 1757033, Ctrl+C to quit.
[20:22:00] Top 10 stacks with outstanding allocations:
[20:22:10] Top 10 stacks with outstanding allocations:
[20:22:20] Top 10 stacks with outstanding allocations:
 addr = ffffaeab4010 size = 1048577
 addr = ffffae2ac000 size = 1052672
 addr = ffffaeab4000 size = 1052672
 addr = ffffae4ae000 size = 1052672
 addr = ffffae8b2000 size = 1052672
 1048577 bytes in 1 allocations from stack
  0x0000ffffba410000 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0]
  0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0]
  0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0]
  0x0000ffffba387068 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0]
  0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0]
  0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0]
  0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0]
  0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0]
  0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0]
  0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0]
  0x0000ffffba08b000 [unknown] [libc.so.6]
  0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6]
  0x0000aaaada4b08b0 _start+0x30 [python3.9]
 4210688 bytes in 4 allocations from stack
  0x0000ffffba0ef730 [unknown] [libc.so.6]
  0x0000ffffba0f03b4 [unknown] [libc.so.6]
  0x0000ffffba0f1598 [unknown] [libc.so.6]
  0x0000fffffffff000 [unknown] [[uprobes]]
  0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0]
  0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0]
  0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0]
  0x0000ffffba387068 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0]
  0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0]
  0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0]
  0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0]
  0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0]
  0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0]
  0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0]
  0x0000ffffba08b000 [unknown] [libc.so.6]
  0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6]
  0x0000aaaada4b08b0 _start+0x30 [python3.9]
[20:22:30] Top 10 stacks with outstanding allocations:
 addr = ffffaeab4010 size = 1048577
 addr = ffffadca6010 size = 1048577
 addr = ffffae2ac000 size = 1052672
 addr = ffffadda7000 size = 1052672
 addr = ffffaeab4000 size = 1052672
 addr = ffffae4ae000 size = 1052672
 addr = ffffad7a1000 size = 1052672
 addr = ffffae8b2000 size = 1052672
 addr = ffffadea8000 size = 1052672
 addr = ffffadca6000 size = 1052672
 2097154 bytes in 2 allocations from stack
  0x0000ffffba410000 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0]
  0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0]
  0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0]
  0x0000ffffba387068 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0]
  0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0]
  0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0]
  0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0]
  0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0]
  0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0]
  0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0]
  0x0000ffffba08b000 [unknown] [libc.so.6]
  0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6]
  0x0000aaaada4b08b0 _start+0x30 [python3.9]
 8421376 bytes in 8 allocations from stack
  0x0000ffffba0ef730 [unknown] [libc.so.6]
  0x0000ffffba0f03b4 [unknown] [libc.so.6]
  0x0000ffffba0f1598 [unknown] [libc.so.6]
  0x0000fffffffff000 [unknown] [[uprobes]]
  0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0]
  0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0]
  0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0]
  0x0000ffffba387068 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0]
  0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0]
  0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0]
  0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0]
  0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0]
  0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0]
  0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0]
  0x0000ffffba08b000 [unknown] [libc.so.6]
  0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6]
  0x0000aaaada4b08b0 _start+0x30 [python3.9]
[20:22:40] Top 10 stacks with outstanding allocations:
 addr = ffffad49e010 size = 1048577
 addr = ffffaeab4010 size = 1048577
 addr = ffffad29c010 size = 1048577
 addr = ffffadca6010 size = 1048577
 addr = ffffad59f010 size = 1048577
 addr = ffffad6a0010 size = 1048577
 addr = ffffad19b010 size = 1048577
 addr = ffffae2ac000 size = 1052672
 addr = ffffadda7000 size = 1052672
 addr = ffffaeab4000 size = 1052672
 addr = ffffae4ae000 size = 1052672
 addr = ffffacd97000 size = 1052672
 addr = ffffad39d000 size = 1052672
 addr = ffffad59f000 size = 1052672
 addr = ffffad09a000 size = 1052672
 addr = ffffad7a1000 size = 1052672
 addr = ffffae8b2000 size = 1052672
 addr = ffffacf99000 size = 1052672
 addr = ffffadea8000 size = 1052672
 addr = ffffadca6000 size = 1052672
 addr = ffffad49e000 size = 1052672
 7340039 bytes in 7 allocations from stack
  0x0000ffffba410000 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0]
  0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0]
  0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0]
  0x0000ffffba387068 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0]
  0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0]
  0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0]
  0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0]
  0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0]
  0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0]
  0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0]
  0x0000ffffba08b000 [unknown] [libc.so.6]
  0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6]
  0x0000aaaada4b08b0 _start+0x30 [python3.9]
 14737408 bytes in 14 allocations from stack
  0x0000ffffba0ef730 [unknown] [libc.so.6]
  0x0000ffffba0f03b4 [unknown] [libc.so.6]
  0x0000ffffba0f1598 [unknown] [libc.so.6]
  0x0000fffffffff000 [unknown] [[uprobes]]
  0x0000ffffba3bc8ec PyByteArray_Resize+0xcc [libpython3.9.so.1.0]
  0x0000ffffba3bcbb8 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba41fe00 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba3c9278 _PyObject_MakeTpCall+0x98 [libpython3.9.so.1.0]
  0x0000ffffba38d9f8 _PyEval_EvalFrameDefault+0x5b98 [libpython3.9.so.1.0]
  0x0000ffffba387068 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba38d428 _PyEval_EvalFrameDefault+0x55c8 [libpython3.9.so.1.0]
  0x0000ffffba48f7e4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba48fc44 _PyEval_EvalCodeWithName+0x64 [libpython3.9.so.1.0]
  0x0000ffffba48fc90 PyEval_EvalCodeEx+0x40 [libpython3.9.so.1.0]
  0x0000ffffba48fccc PyEval_EvalCode+0x2c [libpython3.9.so.1.0]
  0x0000ffffba4cafbc [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cb1d4 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cdcd0 [unknown] [libpython3.9.so.1.0]
  0x0000ffffba4cde84 PyRun_SimpleFileExFlags+0x124 [libpython3.9.so.1.0]
  0x0000ffffba4e8a20 Py_RunMain+0x5f0 [libpython3.9.so.1.0]
  0x0000ffffba4e8f10 Py_BytesMain+0x5c [libpython3.9.so.1.0]
  0x0000ffffba08b000 [unknown] [libc.so.6]
  0x0000ffffba08b0d8 __libc_start_main+0x94 [libc.so.6]
  0x0000aaaada4b08b0 _start+0x30 [python3.9]
............................................................
^C[root@liruilongs.github.io tools]# 

根据分配的内存块大小,调用栈关键路径,我们可以确定是 list 扩容引起的 内存泄漏

PyByteArray_Resize+0xcc 
→ _PyObject_MakeTpCall+0x98 
→ _PyEval_EvalFrameDefault+0x5b98 
→ PyRun_SimpleFileExFlags+0x124 
→ Py_RunMain+0x5f0 
→ Py_BytesMain+0x5c

对于 Python 也无法直接通过 memleak 来实现跟踪,实际中可能需要通过 tracemalloc 等 python 内存工具进行分析, tracemalloc 是 Python 标准库中的​​内存追踪调试工具​​,用于监控和分析 Python 程序的内存分配行为

C 内存泄漏分析

前面我们简单分析了这个 BCC 脚本,可以看到实际上他直接对内核库的一些用户态和内核态的内存分配函数进行埋点跟踪,所以对于用户态的项目来说,用 C 写的更合适一点,我们看一个 Demo

┌──[root@liruilongs.github.io]-[~]
└─$vim memory_leak_demo.c

┌──[root@liruilongs.github.io]-[~]
└─$vim memory_leak_demo.c
┌──[root@liruilongs.github.io]-[~]
└─$gcc -g memory_leak_demo.c -o leak_demo
┌──[root@liruilongs.github.io]-[~]
└─$cat memory_leak_demo.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>

// 内存分配计数器
static int allocation_count = 0;

// 模拟内存分配函数
void* allocate_memory(size_t size) {
    void *ptr = malloc(size);
    if (ptr) {
        allocation_count++;
        // 获取当前时间
        time_t now = time(NULL);
        struct tm *tm_info = localtime(&now);
        char time_buf[20];
        strftime(time_buf, 20, "%Y-%m-%d %H:%M:%S", tm_info);

        // 打印分配日志
        printf("[%s] 分配 #%d: %zu 字节 at 地址 %p\n",
               time_buf, allocation_count, size, ptr);
    } else {
        perror("内存分配失败");
    }
    return ptr;
}

// 模拟内存泄漏的函数
void memory_leak_demo() {
    int *data_buffer = NULL;

    for (int i = 0; i < 1000; i++) {
        // 每次分配1MB内存
        data_buffer = (int*)allocate_memory(1024 * 1024);

        if (data_buffer) {
            // 简单使用内存(实际业务逻辑)
            data_buffer[0] = i;
            printf("写入值: %d\n", data_buffer[0]);
        }

        sleep(1);  // 暂停1秒观察效果

        /* 关键:这里没有释放内存!
           下次循环时指针将被覆盖,
           导致之前分配的内存无法访问 */
    }

    // 最后分配的内存也没有释放!
}

int main() {
    printf("===== 内存泄漏演示开始 =====\n");
    memory_leak_demo();
    printf("===== 演示结束(已泄漏 %d 块内存)=====\n", allocation_count);

    // 程序退出前不释放任何内存
    return 0;
}

使用 memleak 观测内存问题,下面的输出显示

┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools]
└─$./memleak -p $(pgrep leak_demo) --top 3   -s 3 -a 10  -o 20000
Attaching to pid 16369, Ctrl+C to quit.
[10:43:05] Top 3 stacks with outstanding allocations:
        addr = 7fb86c2b7010 size = 15
        addr = 7fb86ada2000 size = 1048576
        addr = 7fb86b3a8000 size = 1048576
        addr = 7fb86b7ac000 size = 1048576
        addr = 7fb86b6ab000 size = 1048576
        addr = 7fb86bbb0010 size = 1048576
        addr = 7fb86b8ad000 size = 1052672
        addr = 7fb86b1a6000 size = 1052672
        addr = 7fb86b5aa000 size = 1052672
        addr = 7fb86afa4000 size = 1052672
        addr = 7fb86b9ae010 size = 1052672
        addr = 7fb86baaf010 size = 1052672
        addr = 7fb86b0a5000 size = 1052672
        3153935 bytes in 4 allocations from stack
                0x00000000004011ae      allocate_memory+0x18 [leak_demo]
                0x000000000040125f      memory_leak_demo+0x23 [leak_demo]
                0x00000000004012bd      main+0x18 [leak_demo]
                0x00007fb870c29590      __libc_start_call_main+0x80 [libc.so.6]
        9457664 bytes in 9 allocations from stack
                0x00007fb870c980dd      sysmalloc+0x7ed [libc.so.6]
[10:43:18] Top 3 stacks with outstanding allocations:
        addr = 7fb86c2b7010 size = 15
       ....................................
                0x00007fb870c980dd      sysmalloc+0x7ed [libc.so.6]
[10:43:28] Top 3 stacks with outstanding allocations:
        addr = 7fb86c2b7010 size = 15
        addr = 7fb86ada2000 size = 1048576
        addr = 7fb86b3a8000 size = 1048576
        addr = 7fb86b7ac000 size = 1048576
        addr = 7fb86a095000 size = 1048576
        addr = 7fb86a99e000 size = 1048576
        addr = 7fb869f94000 size = 1048576
        addr = 7fb86b6ab000 size = 1048576
        addr = 7fb86bbb0010 size = 1048576
        addr = 7fb86b8ad000 size = 1052672
        addr = 7fb86a79c000 size = 1052672
        addr = 7fb86a89d000 size = 1052672
        addr = 7fb86b1a6000 size = 1052672
        addr = 7fb86a398000 size = 1052672
        addr = 7fb86b5aa000 size = 1052672
        addr = 7fb86afa4000 size = 1052672
        addr = 7fb869e93000 size = 1052672
        addr = 7fb869c91000 size = 1052672
        addr = 7fb86a196000 size = 1052672
        addr = 7fb86b9ae010 size = 1052672
        addr = 7fb86baaf010 size = 1052672
        addr = 7fb86b0a5000 size = 1052672
        addr = 7fb869a8f000 size = 1052672
        addr = 7fb86a297000 size = 1052672
        3153935 bytes in 4 allocations from stack
                0x00000000004011ae      allocate_memory+0x18 [leak_demo]
                0x000000000040125f      memory_leak_demo+0x23 [leak_demo]
                0x00000000004012bd      main+0x18 [leak_demo]
                0x00007fb870c29590      __libc_start_call_main+0x80 [libc.so.6]
        21024768 bytes in 20 allocations from stack
                0x00007fb870c980dd      sysmalloc+0x7ed [libc.so.6]
[10:43:38] Top 3 stacks with outstanding allocations:
        addr = 7fb86c2b7010 size = 15
        addr = 7fb86ada2000 size = 1048576
        addr = 7fb86b3a8000 size = 1048576
        addr = 7fb86b7ac000 size = 1048576
        addr = 7fb86a095000 size = 1048576
        addr = 7fb86a99e000 size = 1048576
        addr = 7fb869f94000 size = 1048576
        addr = 7fb86b6ab000 size = 1048576
        addr = 7fb86bbb0010 size = 1048576
        addr = 7fb86b8ad000 size = 1052672
        addr = 7fb86a79c000 size = 1052672
        addr = 7fb86a89d000 size = 1052672
        addr = 7fb86958a000 size = 1052672
        addr = 7fb86b1a6000 size = 1052672
        addr = 7fb86a398000 size = 1052672
        addr = 7fb86b5aa000 size = 1052672
        addr = 7fb86afa4000 size = 1052672
        addr = 7fb869e93000 size = 1052672
        addr = 7fb869c91000 size = 1052672
        addr = 7fb86a196000 size = 1052672
        addr = 7fb869085000 size = 1052672
        addr = 7fb86b9ae010 size = 1052672
        addr = 7fb86baaf010 size = 1052672
        addr = 7fb86b0a5000 size = 1052672
        addr = 7fb869186000 size = 1052672
        addr = 7fb869a8f000 size = 1052672
        addr = 7fb86a297000 size = 1052672
        3153935 bytes in 4 allocations from stack
                0x00000000004011ae      allocate_memory+0x18 [leak_demo]
                0x000000000040125f      memory_leak_demo+0x23 [leak_demo]
                0x00000000004012bd      main+0x18 [leak_demo]
                0x00007fb870c29590      __libc_start_call_main+0x80 [libc.so.6]
        24182784 bytes in 23 allocations from stack
                0x00007fb870c980dd      sysmalloc+0x7ed [libc.so.6]
^C┌──[root@liruilongs.github.io]-[/usr/share/bcc/tools]
└─$

持续增长的未释放内存块(如 1052672 字节 ≈1MB 的多次分配),通过 memleak 打印的 堆栈追踪指向 allocate_memory+0x18memory_leak_demo+0x23 函数

3153935 bytes in 4 allocations from stack
                0x00000000004011ae      allocate_memory+0x18 [leak_demo] #void* allocate_memory(size_t size)
                0x000000000040125f      memory_leak_demo+0x23 [leak_demo] #void memory_leak_demo()
                0x00000000004012bd      main+0x18 [leak_demo]
                0x00007fb870c29590      __libc_start_call_main+0x80 [libc.so.6]

正是上面 Demo 中的调用函数 memory_leak_demo()分配函数 allocate_memory.

关于 BCC 工具 memleak 进行内存泄漏分析和小伙伴分析到这里,上面都是一些 Demo,只是为了展示工具如何使用,实际的分析要结合调用栈复杂的多。

博文部分内容参考

© 文中涉及参考链接内容版权归原作者所有,如有侵权请告知 :)


《BPF Performance Tools》


© 2018-至今, 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)

【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。