性能问答>【赏金20元】怀疑堆外内存泄漏,烦请大家给些思路和建议>
34回复
1年前

【赏金20元】怀疑堆外内存泄漏,烦请大家给些思路和建议


环境参数
  • 操作系统Linux
  • JDK版本JDK8
  • 内存8GB
  • CPU核数null
  • 操作系统位数64位
heap0519.hprof255.18MB
查看详情

问题描述

线上的一个服务,在运行一段时间后,RES 会持续膨胀到 3G,4G 多, 直到被系统 kill 掉。

环境说明

启动参数:

-Xms1024m -Xmx1024m -XX:NativeMemoryTracking=summary -XX:MetaspaceSize=64m -XX:MaxMetaspaceSize=256m 

内存记录

5.19

RES
2736.08 MB

Thread Nums
605

Native Memory:
jcmd $pid VM.native_memory summary scale=MB

Total: reserved=3298MB, committed=2140MB
-                 Java Heap (reserved=1024MB, committed=1024MB)
                            (mmap: reserved=1024MB, committed=1024MB) 
 
-                     Class (reserved=1149MB, committed=139MB)
                            (classes #19913)
                            (malloc=9MB #53910) 
                            (mmap: reserved=1140MB, committed=130MB) 
 
-                    Thread (reserved=610MB, committed=610MB)
                            (thread #605)
                            (stack: reserved=606MB, committed=606MB)
                            (malloc=2MB #3026) 
                            (arena=2MB #1208)
 
-                      Code (reserved=264MB, committed=116MB)
                            (malloc=21MB #27562) 
                            (mmap: reserved=244MB, committed=96MB) 
 
-                        GC (reserved=43MB, committed=43MB)
                            (malloc=6MB #972) 
                            (mmap: reserved=37MB, committed=37MB) 
 
-                  Compiler (reserved=1MB, committed=1MB)
 
-                  Internal (reserved=177MB, committed=177MB)
                            (malloc=177MB #36227) 
 
-                    Symbol (reserved=24MB, committed=24MB)
                            (malloc=21MB #222179) 
                            (arena=4MB #1)
 
-    Native Memory Tracking (reserved=5MB, committed=5MB)
                            (tracking overhead=5MB)

怀疑点

怀疑是 DirectBuffer 没释放回收内存

在 JDK 8 的环境中,不指定  -XX: MaxDirectMemorySize 堆外最大内存值时,默认值与 xmx 差不多大,如果是 DirectBuffer 内存没回收,那么应该也不会超过上面设置的 xmx,实际是超过了,应该要报错:java.lang.OutOfMemoryError: Direct buffer memory 才对,但是日志并没有找到这个错误

怀疑是 Native Code(C 代码)申请的堆外内存

通过开启 NativeMemoryTracking 统计,显示的 committed 的内存小于物理内存,因为 jcmd 命令显示的内存包含堆内内存、Code 区域、通过 unsafe.allocateMemory 和 DirectByteBuffer 申请的内存,但是不包含其他 Native Code(C 代码)申请的堆外内存。

尝试过的方法

dump 内存分析

实际上,dump live 的对象,只有300MB 多,初始分配的 1024MB 的 heap,直至到服务挂掉的最后, heap 使用率都是处于正常的情况。

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 1073741824 (1024.0MB)
   NewSize                  = 357564416 (341.0MB)
   MaxNewSize               = 357564416 (341.0MB)
   OldSize                  = 716177408 (683.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 67108864 (64.0MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 268435456 (256.0MB)
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 353370112 (337.0MB)
   used     = 137123824 (130.77146911621094MB)
   free     = 216246288 (206.22853088378906MB)
   38.80459024219909% used
From Space:
   capacity = 2097152 (2.0MB)
   used     = 1261680 (1.2032318115234375MB)
   free     = 835472 (0.7967681884765625MB)
   60.161590576171875% used
To Space:
   capacity = 2097152 (2.0MB)
   used     = 0 (0.0MB)
   free     = 2097152 (2.0MB)
   0.0% used
PS Old Generation
   capacity = 716177408 (683.0MB)
   used     = 338325072 (322.6519317626953MB)
   free     = 377852336 (360.3480682373047MB)
   47.240399965255534% used

51997 interned Strings occupying 5600608 bytes.

GC 分析

GC 打印也看似正常,看不出问题

jstat -gc -h5 14837 5000
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
2048.0 2048.0 1376.1  0.0   345088.0 216691.7  699392.0   330307.6  133248.0 124961.8 14720.0 13360.8  53010  497.625  92     24.405  522.031
2048.0 2048.0 1376.1  0.0   345088.0 326280.2  699392.0   330307.6  133248.0 124961.8 14720.0 13360.8  53010  497.625  92     24.405  522.031
2048.0 2048.0  0.0   1376.1 345088.0 71113.9   699392.0   330331.6  133248.0 124961.8 14720.0 13360.8  53011  497.634  92     24.405  522.039
2048.0 2048.0  0.0   1376.1 345088.0 169572.3  699392.0   330331.6  133248.0 124961.8 14720.0 13360.8  53011  497.634  92     24.405  522.039
2048.0 2048.0  0.0   1376.1 345088.0 268892.8  699392.0   330331.6  133248.0 124961.8 14720.0 13360.8  53011  497.634  92     24.405  522.039
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
2048.0 2048.0 1296.1  0.0   345088.0 40181.7   699392.0   330363.6  133248.0 124961.8 14720.0 13360.8  53012  497.644  92     24.405  522.050
2048.0 2048.0 1296.1  0.0   345088.0 131806.8  699392.0   330363.6  133248.0 124961.8 14720.0 13360.8  53012  497.644  92     24.405  522.050
2048.0 2048.0 1296.1  0.0   345088.0 220248.8  699392.0   330363.6  133248.0 124961.8 14720.0 13360.8  53012  497.644  92     24.405  522.050
2048.0 2048.0 1296.1  0.0   345088.0 299581.8  699392.0   330363.6  133248.0 124961.8 14720.0 13360.8  53012  497.644  92     24.405  522.050
2048.0 2048.0  0.0   1232.1 345088.0 53573.3   699392.0   330395.6  133248.0 124961.8 14720.0 13360.8  53013  497.653  92     24.405  522.058

Dump 内存块打印

尝试用过 pmap 查找一些连续内存块比较大的内存块,没有发现一些大小相同(64MB)的内存块dump 了一些内存块比较大的(64多,80多,40多),然后尝试用 strings 转换打印下,大部分打印的都是不可读的信息,也看不出什么倪端

提问

这个服务会有一些压缩,解压缩的操作,文件 IO 操作的业务不少。而且目前测试环境没条件重现。

烦请大家帮忙看下,目前没啥思路了。

1249 阅读
请先登录,查看34条精彩评论吧
快去登录吧,你将获得
  • 浏览更多精彩评论
  • 和开发者讨论交流,共同进步