JVM coredump分析系列(4):常见的SIGBUS案例分析原创
1. 前言
笔者先前遇到多个SIGBUS crash问题,在此处归纳整理下问题定位思路并且给出复现的用例,以便提升定位同类问题的效率。通常访问内存触发 SIGBUS 有如下几种场景:
-
未对齐内存的读写 -
机器物理内存故障 -
文件映射异常访问
本文主要从 机器物理内存故障 和 文件映射异常访问 两个场景分别阐述问题发生的现象、排查方法以及复现的用例。
2. 机器物理内存故障触发的SIGBUS
机器上很多进程都会出现crash,每次出现crash的堆栈不一样,并且有些进程crash在系统库上,例如 libc.so、libpthread.so。
2.1 排查方法
-
分析 hs_err_pid 文件
从hs_err_pid文件中可以看出访问地址 0x000000054c0bc000 触发 SIGBUS,并且
si_code
为 4 (BUS_MCEERR_AR)。从官方 sigaction 用户手册中[1]查看到 si_code 中 BUS_MCEERR_AR (4) 、BUS_MCEERR_AO (5) 表示物理内存故障。
BUS_ADRALN
Invalid address alignment.
BUS_ADRERR
Nonexistent physical address.
BUS_OBJERR
Object-specific hardware error.
BUS_MCEERR_AR (since Linux 2.6.32)
Hardware memory error consumed on a machine check;
action required.
BUS_MCEERR_AO (since Linux 2.6.32)
Hardware memory error detected in process but not
consumed; action optional. -
分析系统日志
查看出现crash前后时间点的系统日志,可以看到打印出很多 kernel 异常信息(Hardware Error ,hardware memory error等),从系统日志中进一步佐证是由于物理内存故障导致访问内存crash。
3. 文件映射异常访问触发的SIGBUS
文件映射访问异常触发 SIGBUS 在用户态最为常见[2],也最容易触发。通常来说根本原因都是进程 mmap 了一个文件后,另外的进程把这个文件截断了,导致 mmap 出来的某些内存页超出文件的实际大小,访问那些超出的内存页就会触发 SIGBUS。具体来说有以下几种场景:
-
进程 mmap 一个文件后,其它进程 truncate 该文件到更小; -
动态库更新,直接 cp 覆盖; -
可执行文件更新,直接 cp 覆盖。
3.1 排查方法
我们可以按照如下步骤排查文件映射异常访问触发的SIGBUS:
-
查看 hs_err_pid 文件
T H R E A D
信息中打印的 si_addr; -
查看 hs_err_pid 文件
Dynamic libraries
找到 si_addr 映射的文件; -
在业务日志中打印对应文件的操作记录,查看是否存在并发读写问题。
3.2 复现案例
在Java应用中,每次文件映射异常访问触发的SIGBUS的线程堆栈可能不一样,下面笔者在下文中阐述下最常见的两个案例。
案例一:并发处理同一文件触发SIGBUS
笔者在业务中多次碰到在x86_64机器中调用 ~StubRoutines::jlong_disjoint_arraycopy, 在aarch64机器上调用 ~StubRoutines::arrayof_jlong_disjoint_arraycopy 触发 SIGBUS 问题。通过 3.1 章节的排查方法,最终定位到是由于多个线程同时操作一个文件引起的。
-
触发SIGBUS堆栈信息
// x86_64
Stack: [0x00007f3c798a6000,0x00007f3c799a7000], sp=0x00007f3c799a5940, free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
v ~StubRoutines::jlong_disjoint_arraycopy
J 1553 C2 java.nio.Bits.copyToArray(JLjava/lang/Object;JJJ)V (68 bytes) @ 0x00007f40c184f100 [0x00007f40c184f0a0+0x60]
J 1551 C2 java.nio.DirectByteBuffer.get([BII)Ljava/nio/ByteBuffer; (126 bytes) @ 0x00007f40c184dfcc [0x00007f40c184df20+0xac]
J 1549 C2 java.nio.ByteBuffer.get([B)Ljava/nio/ByteBuffer; (9 bytes) @ 0x00007f40c184c424 [0x00007f40c184c3e0+0x44]
j TestSigBus$2.run()V+68
J 1501 C2 java.lang.Thread.run()V (17 bytes) @ 0x00007f40c181d56c [0x00007f40c181d520+0x4c]
v ~StubRoutines::call_stub
V [libjvm.so+0x6e87e5] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xdd5
V [libjvm.so+0x6e5d1b] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ab
V [libjvm.so+0x6e6337] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x57
V [libjvm.so+0x7865cb] thread_entry(JavaThread*, Thread*)+0x7b
V [libjvm.so+0xb00911] JavaThread::thread_main_inner()+0xf1
V [libjvm.so+0x9a5558] java_start(Thread*)+0xf8
C [libpthread.so.0+0x8164] start_thread+0xe4
// aarch64
Stack: [0x0000fffda92b0000,0x0000fffda94b0000], sp=0x0000fffda94ae380, free space=2040k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
v ~StubRoutines::arrayof_jlong_disjoint_arraycopy
J 1730 C2 java.nio.Bits.copyToArray(JLjava/lang/Object;JJJ)V (68 bytes) @ 0x0000ffff718c0848 [0x0000ffff718c0800+0x48]
C 0x0000000000002000 -
复现步骤
-
在主线程中初始化文件写入 2 个 PAGE_SIZE 字节数据, 并且调用mmap映射文件; -
创建一个 truncate
线程, 先清空文件然后再写入一个 PAGE_SIZE 字节数据; -
创建一个 read
线程,读取所有的文件数据; -
执行用例 TestSigBus -
复现代码
import sun.misc.Unsafe;
import java.io.*;
import java.lang.reflect.Field;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Arrays;
import java.util.concurrent.locks.ReentrantLock;
public class TestSigBus {
private static Unsafe unsafe;
private static int pageSize;
private static int fileSize;
static {
unsafe = createUnsafe();
pageSize = unsafe.pageSize();
fileSize = pageSize * 2;
}
public static Unsafe createUnsafe() {
try {
Class<?> unsafeClass = Class.forName("sun.misc.Unsafe");
Field field = unsafeClass.getDeclaredField("theUnsafe");
field.setAccessible(true);
Unsafe unsafe = (Unsafe) field.get(null);
return unsafe;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public static File initFile() {
if (unsafe == null) {
System.err.println("Create Unsafe failed.");
return null;
}
File file = new File("/home/xiezhaokun/test/tmp.ttt");
try (FileWriter fileWriter = new FileWriter(file)) {
for (int i = 0; i < fileSize; i++) {
fileWriter.write('1');
fileWriter.flush();
}
} catch (IOException e) {
e.printStackTrace();
}
return file;
}
public static MappedByteBuffer mappingFile(File file) {
MappedByteBuffer mappedByteBuffer = null;
try (FileInputStream fileInputStream = new FileInputStream(file)) {
FileChannel fileChannel = fileInputStream.getChannel();
long size = fileChannel.size();
System.out.println(size);
mappedByteBuffer = fileChannel.map(
FileChannel.MapMode.READ_ONLY, 0, size);
} catch (Exception e) {
e.printStackTrace();
}
return mappedByteBuffer;
}
public static boolean truncateFile(File file) {
int len = pageSize;
try (FileWriter fileWriter = new FileWriter(file)) {
for (int i = 0; i < len; i++) {
fileWriter.write('1');
fileWriter.flush();
}
} catch (IOException e) {
e.printStackTrace();
return false;
}
return true;
}
public static void main(String[] args) throws InterruptedException {
// init
File file = initFile();
if (file == null) {
System.err.println("Init file failed.");
return;
}
// mapping
MappedByteBuffer mappedByteBuffer = mappingFile(file);
if (mappedByteBuffer == null) {
System.err.println("Mapping file failed.");
return;
}
ReentrantLock lock = new ReentrantLock();
// truncate thread
new Thread(new Runnable() {
@Override
public void run() {
lock.lock();
try {
boolean isSuccess = truncateFile(file);
if (!isSuccess) {
System.err.println("Clear file failed.");
return;
}
} finally {
lock.unlock();
}
}
}).start();
Thread.sleep(2000);
// read thread
/*
* The byteLen should be more than 6 (java.nio.Bits.JNI_COPY_TO_ARRAY_THRESHOLD).
* @see java.nio.DirectByteBuffer#get(byte[], int, int)
* @see java.nio.Bits.JNI_COPY_TO_ARRAY_THRESHOLD
*
*/
new Thread(new Runnable() {
@Override
public void run() {
lock.lock();
try {
int byteLen = 8;
byte[] bytes = new byte[byteLen];
int capacity = mappedByteBuffer.capacity();
int loops = capacity / byteLen;
for (int i = 0; i < loops; i++) {
mappedByteBuffer.get(bytes);
}
} finally {
lock.unlock();
}
}
}).start();
}
}
案例二:处理压缩文件时,压缩文件被修改或清空触发SIGBUS
-
触发SIGBUS堆栈信息
Stack: [0x0000ffff847d0000,0x0000ffff849d0000], sp=0x0000ffff849cda60, free space=2038k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libzip.so+0x139fc] newEntry.isra.4+0x74
C [libzip.so+0x14918] ZIP_GetEntry2+0x168
C [libzip.so+0x4488] Java_java_util_zip_ZipFile_getEntry+0x98
j java.util.zip.ZipFile.getEntry(J[BZ)J+0
j java.util.zip.ZipFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;+38
j java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;+2
j java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry;+2
j java.util.jar.JarFile.getManEntry()Ljava/util/jar/JarEntry;+11
j java.util.jar.JarFile.getManifestFromReference()Ljava/util/jar/Manifest;+27
j java.util.jar.JarFile.getManifest()Ljava/util/jar/Manifest;+1
j TestJarFileSigBus.main([Ljava/lang/String;)V+20
v ~StubRoutines::call_stub
V [libjvm.so+0x6d057c] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xe54
V [libjvm.so+0x75caf8] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) [clone .isra.83] [clone .constprop.126]+0x198
V [libjvm.so+0x75edb0] jni_CallStaticVoidMethod+0x148
C [libjli.so+0x8530]
C [libpthread.so.0+0x7d38] start_thread+0xb4
C [libc.so.6+0xdf5f0] thread_start+0x30 -
复现步骤
-
创建jar包文件,里面包含一个TestClass类; -
清空该jar包文件; -
执行用例 TestJarFileSigBus -
复现代码
public class TestClass {
static {
System.out.println("test");
}
public static void main(String[] args) {
System.out.println(TestClass.class);
}
}import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.jar.*;
public class TestJarFileSigBus {
static String jarPath = "test.jar";
static String classPath = "TestClass.class";
public static void main(String[] args) throws IOException {
createTestJarFile();
try (JarFile jarFile = new JarFile(jarPath)) {
clearJarFile();
jarFile.getManifest();
}
}
public static void clearJarFile() throws IOException {
try (FileWriter fileWriter = new FileWriter(jarPath)) {
fileWriter.write("");
}
}
public static void createTestJarFile() throws IOException {
Manifest manifest = new Manifest();
Attributes mainAttributes = manifest.getMainAttributes();
mainAttributes.put(new Attributes.Name("Manifest-Version"), "1.0.0");
mainAttributes.put(new Attributes.Name("Main-Class"), "TestClass");
Path path = Paths.get(classPath);
try (JarOutputStream jos = new JarOutputStream(
new FileOutputStream(jarPath), manifest)) {
byte[] bytes = Files.readAllBytes(path);
for (int i = 0; i < 10; i++) {
JarEntry jarEntry = new JarEntry("TestClass" + i + ".class");
jos.putNextEntry(jarEntry);
jos.write(bytes);
jos.closeEntry();
}
jos.finish();
}
}
}
参考
-
https://man7.org/linux/man-pages/man2/sigaction.2.html
-
https://www.cnblogs.com/catch/p/10973762.html
Compiler SIG 专注于编译器领域技术交流探讨和分享,包括 GCC/LLVM/OpenJDK 以及其他的程序优化技术,聚集编译技术领域的学者、专家、学术等同行,共同推进编译相关技术的发展。
扫码添加 SIG 小助手微信,邀请你进 Compiler SIG 微信交流群。