JDK8下进程消失问题, 感兴趣的求讨论讨论~
启动配置:
-Xms12288m -Xmx12288m -verbose:gc -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:G1HeapRegionSize=4m -XX:G1ReservePercent=15 -XX:InitiatingHeapOccupancyPercent=45 -XX:MaxTenuringThreshold=7 -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=368m -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UnlockExperimentalVMOptions -XX:G1LogLevel=finest -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -Xloggc:/xx/gc.log -XX:NewSize=1024m -XX:MaxNewSize=1024m
系统在运行一段时间会后会发生莫名的crash。在2周内, 在不同机器上crash了3次。每次crash时,进程都会在ObjectSynchronizer::FastHashCode处退出。
crash关键log如下:
A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x00007f2a52d6e477, pid=1449, tid=139813876594432
JRE version: Java™ SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
Java VM: Java HotSpot™ 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed oops)
Problematic frame:
V [libjvm.so+0xa2b477] ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x57
--------------- T H R E A D ---------------
Current thread (0x00007f2a4fe3b800): JavaThread “pool-8-thread-1” [_thread_in_vm, id=1890, stack(0x00007f28f462f000,0x00007f28f4730000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
Registers:
RAX=0x0000000000000002, RBX=0x9b5148c3f8222761, RCX=0x0000000000000040, RDX=0x00007f2a532f5d30
RSP=0x00007f28f472e3d0, RBP=0x00007f28f472e400, RSI=0x000000078033a740, RDI=0x000000078033a740
R8 =0x000000078a5ec844, R9 =0x0000000000000000, R10=0x00007f2a3d38bc0f, R11=0x0000000000000003
R12=0x000000078033a740, R13=0x00007f2a4fe3b800, R14=0x00007f2a532fa2a0, R15=0x00007f2a532f5d40
RIP=0x00007f2a52d6e477, EFLAGS=0x0000000000010282, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
TRAPNO=0x000000000000000d
Instructions: (pc=0x00007f2a52d6e477)
0x00007f2a52d6e457: 01 00 00 4c 89 e7 e8 fe dd ff ff 48 89 c3 83 e0
0x00007f2a52d6e467: 07 48 83 e8 01 74 7a f6 c3 02 74 55 48 83 f3 02
0x00007f2a52d6e477: 48 8b 03 48 c1 e8 08 48 89 c2 81 e2 ff ff ff 7f
0x00007f2a52d6e487: 75 27 4c 89 e6 4c 89 ef e8 cc e7 ff ff 48 8b 18
Stack: [0x00007f28f462f000,0x00007f28f4730000], sp=0x00007f28f472e3d0, free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xa2b477] ObjectSynchronizer::FastHashCode(Thread*, oopDesc*)+0x57
V [libjvm.so+0x7116d8] JVM_IHashCode+0xb8
J 1214 java.lang.System.identityHashCode(Ljava/lang/Object;)I (0 bytes) @ 0x00007f2a3d38bc7f [0x00007f2a3d38bbc0+0xbf]
J 17664 C2 net.sf.ehcache.pool.sizeof.ObjectGraphWalker.walk(IZ[Ljava/lang/Object;)J (585 bytes) @ 0x00007f2a40052118 [0x00007f2a400511e0+0xf38]
J 17748 C2 net.sf.ehcache.store.MemoryStore.getInMemorySizeInBytes()J (136 bytes) @ 0x00007f2a40093920 [0x00007f2a40093400+0x520]
从出错能看到RBX是个异常的内存地址,这里发生了非法访问。然后从instructions反推到汇编:
a2b46e: f6 c3 02 test $0x2,%bl
a2b471: 74 55 je a2b4c8 <_ZN18ObjectSynchronizer12FastHashCodeEP6ThreadP7oopDesc+0xa8>
a2b473: 48 83 f3 02 xor $0x2,%rbx
这里对应的代码应该是https://github.com/unofficial-openjdk/openjdk/blob/jdk8u/jdk8u/hotspot/src/share/vm/runtime/synchronizer.cpp
if (Self->is_lock_owned((address)mark->locker()))
BasicLock* locker() const {
assert(has_locker(), “check”);
return (BasicLock*) value();
}
到这里我就没有头绪了。为什么这里markOop上的BasicLock*会是一个非法地址?
毕昇 JDK 团队有一篇文章https://www.eet-china.com/mp/a70808.html描述的问题和这个类似。但是他的背景是在GC时线程的并发操作下内存屏障未设置产生的乱序。但是我的crash文件明确说了crash时不在safepoint, 所以排除了gc的问题。
所以是G1在JDK的bug么? 找了openjdk和oracle的bug贴, 这一篇https://bugs.openjdk.java.net/browse/JDK-8130042 有点像,但是后面没下文了。
还请各位指教。