性能文章>一个诡异的Kafka消息OOM问题,趁机扒一扒StringBuilder>

一个诡异的Kafka消息OOM问题,趁机扒一扒StringBuilder原创

324614

1. 前言

起源是某个客户线上环境的一次OOM,据说这种诡异的OOM已经第二次了。首先描述一下OOM的相关信息:

堆内存13G;
使用G1垃圾收集器;
OOM后自动生成的dump文件7G;
应用大致功能是消费kafka数据,经过一些业务处理然后再次发回kafka【单条消息据说32M左右】

客户的疑点:

堆内存13G,dump文件7G,明明内存是够的,怎么就OOM了?
由于线上环境问题,不能随意复盘调试,只有一份内存Dump文件可以分析。

2. 线上Dump文件分析

2.1 问题点1 - kafka的RecordAccumulator占用大量内存

此处无截图(忘记了),脑补😅。一度怀疑是kafka的本地缓存导致的OOM。一般情况RecordAccumulator和KafkaProducer是一一对应的【KafkaProducer每次初始化的时候在构造行数中创建RecordAccumulator实例】。但是客户说KafkaProducer是存在ThreadLocal中的,不会无限制创建,即RecordAccumulator也不会无限制创建。

先保留此处问题可能性(毕竟RecordAccumulator实例数量将近140多,不太正常)。

后面大致搜了一把DefaultKafkaProducerFactory,看看这个工厂类里是不是有什么玄机。真的发现了点什么:大致情况就是发送消息的回调方法里,若发送消息抛出异常❌,就会把相关的KafkaProducer实例关闭(物理关闭),后面需要的时候会重新创建。

这个能解释为什么KafkaProducer和线程数不匹配的问题。但是RecordAccumulator实例数过多的问题还是没办法解释,可能需要硬啃源码了😤。

关于DefaultKafkaProducerFactory可参考这篇博客:https://blog.csdn.net/Pacoson

2.2 问题点1 - 线程栈里的OOM信息

发现kafka producer的业务线程栈有OOM的内容:
image1.png

2.3 问题点2 - 线程栈中有kafka消息发送失败的信息

kafka producer回调函数中出现RecordTooLargeExeception报错
image2.png
kafka producer开始记录错误日志
image3.png

2.4 问题点3 - RecordTooLargeExeception

image4.png
  分析完一圈内存Dump,我给了自己一个大胆的猜想,会不会是实际的堆内存够的,而是触发了某个别的什么条件,导致显示的抛出了OutOfMemoryError的报错。

带着这个疑惑,还是把注意力放在了线程栈上,再仔细一看,突然悟出了点什么。线程栈中是打印错误日志的过程中抛出的OOM报错。而且是在构建日志消息的过程中进行字符串拼接,进而进行StringBuilder扩容引起的数组拷贝,然后就是OOM了。至此,我把注意里转移到了StringBuilder上。

3. 扒一扒StringBuilder扩容

带着上面所以的猜想,仔细翻阅了一下StringBuilder中append()的源码以及扩容机制。

在StringBuilder中,有一个字符数组叫value【声明在父类AbstractStringBuilder中】,这个才是用来存放字符串内容的(在String类中同样有)。
image5.png

StringBuilder.append()方法最终会调用父类的java.lang.AbstractStringBuilder#append(java.lang.String)方法,源码如下:

public AbstractStringBuilder append(String str) {
        if (str == null)
            return appendNull();
        // 获取要追加的字符串长度
        int len = str.length();
        // 检查StringBuilder中字符数组的容量,以确保可以追加成功
        ensureCapacityInternal(count + len);
        str.getChars(0, len, value, count);
        count += len;
        return this;
    }


接下来看ensureCapacityInternal()方法:
image6.png

可以看到,如果新字符串长度大于原来的字符串长度就会进行一次数组的拷贝。而且注意上面的注释:如果数值溢出,会抛出OutOfMemoryError。感觉我的猜想马上要被印证了,激动ing。

仔细看newCapacity()方法,

    /**
     * The maximum size of array to allocate (unless necessary).
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     */
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    /**
     * Returns a capacity at least as large as the given minimum capacity.
     * Returns the current capacity increased by the same amount + 2 if
     * that suffices.
     * Will not return a capacity greater than {@code MAX_ARRAY_SIZE}
     * unless the given minimum capacity is greater than that.
     *
     * @param  minCapacity the desired minimum capacity
     * @throws OutOfMemoryError if minCapacity is less than zero or
     *         greater than Integer.MAX_VALUE
     */
    private int newCapacity(int minCapacity) {
        // overflow-conscious code
        // 新数组的长度为原来数组长度的2倍➕2
        int newCapacity = (value.length << 1) + 2;
        // 如果追加新字符所需最少的数组长度 大于 newCapacity,则将minCapacity的值赋给newCapacity
        // 这种情况只出现在追加的字符串长度是原字符串长度的2倍多的多
        if (newCapacity - minCapacity < 0) {
            newCapacity = minCapacity;
        }
        // 如果newCapacity大于Integer.MAX_VALUE - 8,则走hugeCapacity的逻辑
        // 否则直接返回newCapacity
        return (newCapacity <= 0 || MAX_ARRAY_SIZE - newCapacity < 0)
            ? hugeCapacity(minCapacity)
            : newCapacity;
    }


MAX_ARRAY_SIZE的值是Integer.MAX_VALUE - 8。并且注释给了提示:这是数组最大的长度了(除非必要),分配大数组可能导致OOM。
image7.png

再看hugeCapacity()方法,

private int hugeCapacity(int minCapacity) {
    // 如果追加新字符所需最少的数组长度 大于Integer.MAX_VALUE,直接OOM
    if (Integer.MAX_VALUE - minCapacity < 0) { // overflow
        throw new OutOfMemoryError();
    }
    /**
    * 如果追加新字符所需最少的数组长度 大于Integer.MAX_VALUE - 8,
    * 则新数组长度是minCapacity,否则数组长度就是MAX_ARRAY_SIZE的值
    * 在minCapacity大于Integer.MAX_VALUE一半的时候的最近一次扩容,
    * 新数组长度会直接使用MAX_ARRAY_SIZE
    **/
    return (minCapacity > MAX_ARRAY_SIZE)
        ? minCapacity : MAX_ARRAY_SIZE;
}

重点一句话总结下:字符串长度大于Integer.MAX_VALUE就会直接OOM。
那问题又来了:

image8.jpeg
32M的消息,再怎么扩容也不会超过Integer.MAX_VALUE的值吧;
看下面的demo,这尼玛和Integer.MAX_VALUE差太多了吧;
image9.png
莫慌,我们继续(手摇轮椅上路)🦽。

先说结论,这个其实和当前可用堆内存有关系。

4. 数组对象分配

先放一组数据:Integer.MAX_VALUE大小的字符数组大概需要占用4G堆内存(2147483646*2/1024/1024/1024)。

先看下我本地做的两个实验,顺便复习一下GC log。

启动参数(GC没配置,jdk8默认PS垃圾回收):-Xmx8g -Xms8g -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc.log
image10.png

✨GC日志如下:

Java HotSpot(TM) 64-Bit Server VM (25.281-b09) for bsd-amd64 JRE (1.8.0_281-b09), built on Dec  9 2020 12:44:49 by "java_re" with gcc 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
Memory: 4k page, physical 33554432k(1613060k free)

/proc/meminfo:

CommandLine flags: -XX:InitialHeapSize=8589934592 -XX:MaxHeapSize=8589934592 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
{Heap before GC invocations=1 (full 0):
 PSYoungGen      total 2446848K, used 1436457K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 68% used [0x0000000715580000,0x000000076d04a5f8,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 3538944K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698000020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.302: [GC (Allocation Failure) [PSYoungGen: 1436457K->1444K(2446848K)] 4975401K->3540396K(8039424K), 0.0133225 secs] [Times: user=0.16 sys=0.00, real=0.01 secs] 
Heap after GC invocations=1 (full 0):
 PSYoungGen      total 2446848K, used 1444K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x00000007957691e0,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 3538952K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698002020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=2 (full 0):
 PSYoungGen      total 2446848K, used 1444K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x00000007957691e0,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 3538952K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698002020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.316: [GC (Allocation Failure) [PSYoungGen: 1444K->1491K(2446848K)] 3540396K->3540451K(8039424K), 0.0185590 secs] [Times: user=0.23 sys=0.00, real=0.02 secs] 
Heap after GC invocations=2 (full 0):
 PSYoungGen      total 2446848K, used 1491K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aac74db8,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 3538960K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698004020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=3 (full 1):
 PSYoungGen      total 2446848K, used 1491K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aac74db8,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 3538960K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 63% used [0x00000005c0000000,0x0000000698004020,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.334: [Full GC (Allocation Failure) [PSYoungGen: 1491K->0K(2446848K)] [ParOldGen: 3538960K->2360557K(5592576K)] 3540451K->2360557K(8039424K), [Metaspace: 3918K->3918K(1056768K)], 0.2781291 secs] [Times: user=2.89 sys=0.16, real=0.28 secs] 
Heap after GC invocations=3 (full 1):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=4 (full 1):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
  to   space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.613: [GC (Allocation Failure) [PSYoungGen: 0K->0K(2446848K)] 2360557K->2360557K(8039424K), 0.0177683 secs] [Times: user=0.22 sys=0.00, real=0.01 secs] 
Heap after GC invocations=4 (full 1):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=5 (full 2):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360557K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065013b760,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
7.630: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(2446848K)] [ParOldGen: 2360557K->2360504K(5592576K)] 2360557K->2360504K(8039424K), [Metaspace: 3918K->3918K(1056768K)], 0.0095262 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
Heap after GC invocations=5 (full 2):
 PSYoungGen      total 2446848K, used 0K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 0% used [0x0000000715580000,0x0000000715580000,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360504K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065012e1d8,0x0000000715580000)
 Metaspace       used 3918K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

Heap
 PSYoungGen      total 2446848K, used 52441K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 2097664K, 2% used [0x0000000715580000,0x00000007188b67e8,0x0000000795600000)
  from space 349184K, 0% used [0x0000000795600000,0x0000000795600000,0x00000007aab00000)
  to   space 349184K, 0% used [0x00000007aab00000,0x00000007aab00000,0x00000007c0000000)
 ParOldGen       total 5592576K, used 2360504K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
  object space 5592576K, 42% used [0x00000005c0000000,0x000000065012e1d8,0x0000000715580000)
 Metaspace       used 3926K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 430K, capacity 460K, committed 512K, reserved 1048576K


可以看到最后经过Full GC后,字符数组被挪到了老年代,此时老年代占用2360504K,正好是长度为1207959551的字符数组的大小。此时如果要再进行扩容,新数组长度就是Integer.MAX_VALUE - 8,也就是需要在堆内存再分配将近4G的数组,然而不管是年轻代还是老年代,都容不下这4G大小,因此也就OOM了。

OOM信息:OutOfMemoryError: Java heap space,区别于下面16g堆内存的OOM,这个代表是在对象实际分配内存的时候出现内存不足的问题。
image11.png
启动参数(GC没配置,jdk8默认PS垃圾回收):-Xmx16g -Xms16g -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc.log
【eden:5588992K ParOldGen:11185152K】
image12.png

✨GC日志如下:

Java HotSpot(TM) 64-Bit Server VM (25.281-b09) for bsd-amd64 JRE (1.8.0_281-b09), built on Dec  9 2020 12:44:49 by "java_re" with gcc 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
Memory: 4k page, physical 33554432k(942232k free)

/proc/meminfo:

CommandLine flags: -XX:InitialHeapSize=17179869184 -XX:MaxHeapSize=17179869184 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
{Heap before GC invocations=1 (full 0):
 PSYoungGen      total 4893184K, used 2788885K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 66% used [0x000000066ab00000,0x0000000714e85418,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 10747904K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 96% used [0x00000003c0000000,0x0000000650000010,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
18.887: [GC (Allocation Failure) [PSYoungGen: 2788885K->1488K(4893184K)] 13536789K->10749400K(16078336K), 0.0556800 secs] [Times: user=0.39 sys=0.21, real=0.05 secs] 
Heap after GC invocations=1 (full 0):
 PSYoungGen      total 4893184K, used 1488K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ac74010,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 10747912K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 96% used [0x00000003c0000000,0x0000000650002010,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=2 (full 1):
 PSYoungGen      total 4893184K, used 1488K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ac74010,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 10747912K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 96% used [0x00000003c0000000,0x0000000650002010,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
18.943: [Full GC (Ergonomics) [PSYoungGen: 1488K->0K(4893184K)] [ParOldGen: 10747912K->4195565K(11185152K)] 10749400K->4195565K(16078336K), [Metaspace: 3914K->3914K(1056768K)], 0.9751963 secs] [Times: user=2.33 sys=4.18, real=0.98 secs] 
Heap after GC invocations=2 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 4195565K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c013b720,0x000000066ab00000)
 Metaspace       used 3914K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=3 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
20.945: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389869K->8389869K(16078336K), 0.0296937 secs] [Times: user=0.38 sys=0.00, real=0.03 secs] 
Heap after GC invocations=3 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=4 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
20.975: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389869K->8389869K(16078336K), 0.0293211 secs] [Times: user=0.37 sys=0.00, real=0.04 secs] 
Heap after GC invocations=4 (full 1):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=5 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389869K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c013b728,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.019: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] [ParOldGen: 8389869K->4195332K(11185152K)] 8389869K->4195332K(16078336K), [Metaspace: 3916K->3916K(1056768K)], 0.4611617 secs] [Times: user=5.40 sys=0.04, real=0.46 secs] 
Heap after GC invocations=5 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 4195332K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0101090,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=6 (full 2):
 PSYoungGen      total 4893184K, used 83886K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 2% used [0x000000066ab00000,0x000000066fceb868,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.963: [GC (Allocation Failure) [PSYoungGen: 83886K->0K(4893184K)] 8473522K->8389636K(16078336K), 0.0356100 secs] [Times: user=0.46 sys=0.00, real=0.04 secs] 
Heap after GC invocations=6 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=7 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x0000000795580000,0x0000000795580000,0x00000007c0000000)
  to   space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
21.999: [GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] 8389636K->8389636K(16078336K), 0.0234444 secs] [Times: user=0.30 sys=0.00, real=0.02 secs] 
Heap after GC invocations=7 (full 2):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=8 (full 3):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389636K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c0101098,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
22.023: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(4893184K)] [ParOldGen: 8389636K->4195323K(11185152K)] 8389636K->4195323K(16078336K), [Metaspace: 3916K->3916K(1056768K)], 0.4557171 secs] [Times: user=5.56 sys=0.04, real=0.46 secs] 
Heap after GC invocations=8 (full 3):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 4195323K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c00fedf8,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=9 (full 3):
 PSYoungGen      total 4893184K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 4194304K, 0% used [0x000000066ab00000,0x000000066ab00000,0x000000076ab00000)
  from space 698880K, 0% used [0x000000076ab00000,0x000000076ab00000,0x0000000795580000)
  to   space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
22.976: [GC (Allocation Failure) [PSYoungGen: 0K->0K(5590528K)] 8389627K->8389627K(16775680K), 0.0401370 secs] [Times: user=0.52 sys=0.00, real=0.04 secs] 
Heap after GC invocations=9 (full 3):
 PSYoungGen      total 5590528K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 0% used [0x000000066ab00000,0x000000066ab00000,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=10 (full 3):
 PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
25.776: [GC (Allocation Failure) --[PSYoungGen: 4194304K->4194304K(5590528K)] 12583931K->12583931K(16775680K), 0.0263122 secs] [Times: user=0.32 sys=0.00, real=0.02 secs] 
Heap after GC invocations=10 (full 3):
 PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

{Heap before GC invocations=11 (full 4):
 PSYoungGen      total 5590528K, used 4194304K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 75% used [0x000000066ab00000,0x000000076ab00008,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 8389627K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 75% used [0x00000003c0000000,0x00000005c00fee00,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
25.802: [Full GC (Ergonomics) [PSYoungGen: 4194304K->0K(5590528K)] [ParOldGen: 8389627K->4195328K(11185152K)] 12583931K->4195328K(16775680K), [Metaspace: 3916K->3916K(1056768K)], 0.8243356 secs] [Times: user=1.74 sys=1.38, real=0.83 secs] 
Heap after GC invocations=11 (full 4):
 PSYoungGen      total 5590528K, used 0K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 0% used [0x000000066ab00000,0x000000066ab00000,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 4195328K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0100008,0x000000066ab00000)
 Metaspace       used 3916K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 429K, capacity 460K, committed 512K, reserved 1048576K
}

Heap
 PSYoungGen      total 5590528K, used 4343344K [0x000000066ab00000, 0x00000007c0000000, 0x00000007c0000000)
  eden space 5588992K, 77% used [0x000000066ab00000,0x0000000773c8c0c0,0x00000007bfd00000)
  from space 1536K, 0% used [0x00000007bfe80000,0x00000007bfe80000,0x00000007c0000000)
  to   space 1536K, 0% used [0x00000007bfd00000,0x00000007bfd00000,0x00000007bfe80000)
 ParOldGen       total 11185152K, used 4195328K [0x00000003c0000000, 0x000000066ab00000, 0x000000066ab00000)
  object space 11185152K, 37% used [0x00000003c0000000,0x00000004c0100008,0x000000066ab00000)
 Metaspace       used 3924K, capacity 4568K, committed 4864K, reserved 1056768K
  class space    used 430K, capacity 460K, committed 512K, reserved 1048576K


可以看到老年代占用为4195328K,这个是扩容前字符数组所占大小,年轻代eden区4194304K,是扩容后长度2147483646的新数组的大小。因此在16G的堆上经过几次Full GC的整理,最终能够实现StringBuilder的顶格扩容。

需要注意的是⚠️:demo中只有一个StringBuilder,在实际环境中势必存在很多其他的对象,因此基本不存在
StringBuilder顶格扩容的情况。更多的是在StringBuilder扩容时,可能伴随着其他对象的分配,而导致
Arrays#copyOf方法抛出OOM。

OOM信息:Requested array size exceeds VM limit,这个其实是在实际分配内存之前抛出的OOM异常:
image13.png

段落引用说明在实际分配内存之前,检查发现数组的长度越界了。

段落引用🚗关于数组越界:偷懒直接debug了一下hotspot源码,64位jdk8下数组最大长度是Integer.MAX_VALUE-2,也即2147483645【不同位数的操作系统、不同的数组元素类型、以及是否开启指针压缩等,都会导致数组可用最大长度(max_array_length)的变化】。

结论:

  1. StringBuilder在扩容的时候,会进行字符数组拷贝,在数组长度小于Integer.MAX_VALUE一半的情况下,会创建一个新数组用于拷贝,新数组长度是原数组的2倍➕2【此时内存占用:原数组大小➕2*原数组大小】;
  2. 如果数组超过Integer.MAX_VALUE的一半的时候的最近一次扩容,新数组长度直接使用MAX_ARRAY_SIZE(Integer.MAX_VALUE - 8);
  3. 在16G堆内存默认比例划分的情况下,字符数组才能成功扩容至Integer.MAX_VALUE长度(⚠️注意,这里的demo仅仅只有StringBuilder对象,实际情况肯定还有很多其他对象,因此实际环境很少有StringBuilder可以实现顶格扩容);
  4. 在扩容的过程中,新数组的大小如果eden区分配不下,则进行一次Young GC,如果还是无法分配,则直接在老年代分配;如果老年待也分配不下,则进行Full GC【可能进行几轮Full GC】,之后还是分配不下则OOM(抛出的地方是在java.util.Arrays#copyOf(char[], int)的new char[newLength]创建新数组的时候);

5. 最终分析

通过上面的实验和分析,针对客户线上的OOM问题可以大致有如下结论:

  1. OOM是打印error日志构建StringBuilder的时候抛出的;
  2. StringBuilder的扩容没有达到Integer.MAX_VALUE的大小,因为java.util.Arrays#copyOf(char[], int)抛出OOM的时候,旧的原数组内存大小只有73728K(此时扩容时新数组大小为147456K),远远没到Integer.MAX_VALUE的大小。
  3. 消息过大导致kafka消息发送失败,会导致KafkaProducer被关闭然后重新创建【DefaultKafkaProducerFactory的逻辑】。 继续保留对RecordAccumulator实例数量过多的怀疑,需要通过源码继续深入分析;
  4. OOM的原因是伴随着堆内存中其他对象的内存占用,在StringBuilder扩容数组拷贝的时候无法分配新的数组对象导致的【需要结合GC log深入分析】;
点赞收藏
开翻挖掘机
请先登录,查看1条精彩评论吧
快去登录吧,你将获得
  • 浏览更多精彩评论
  • 和开发者讨论交流,共同进步

为你推荐

超简单的Kafka架构入门指南,看这一篇就够了

超简单的Kafka架构入门指南,看这一篇就够了

还不懂分布系统,速看Kafka Controller选举过程

还不懂分布系统,速看Kafka Controller选举过程

分布式服务必问,Kafka分区Leader选举过程

分布式服务必问,Kafka分区Leader选举过程

Wireshark 的抓包和分析

Wireshark 的抓包和分析

Kafka是如何支持百万级TPS的?

Kafka是如何支持百万级TPS的?

RocketMQ生产环境出现故障,一起探究问题根因!

RocketMQ生产环境出现故障,一起探究问题根因!

4
1