JVM coredump分析系列(5):使用netty-tcnative出现SIGSEGV crash分析原创
1. 问题背景
笔者在分析业务线问题时,多次遇到使用 netty-tcnative 出现SIGSEGV crash的问题,在此处归纳整理下问题定位思路并且给出复现的用例,以便提升定位同类问题的效率。用户在业务进程使用netty中openssl实现的TLS,并且在启动参数中配置 -Djdk.tls.ephemeralDHKeySize=3072
,进程启动后访问业务出现 SIGSEGV crash。具体crash堆栈信息如下:
Stack: [0x00007f822a2d6000,0x00007f822a317000], sp=0x00007f822a312d38, free space=243k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libc.so.6+0x172761] __strlen_sse2_pminub+0x11
C [libio_grpc_netty_shaded_netty_tcnative_linux_x86_641710973062976904606.so+0x186a78]
C [libio_grpc_netty_shaded_netty_tcnative_linux_x86_641710973062976904606.so+0x28f50]
j io.grpc.netty.shaded.io.netty.internal.tcnative.SSLContext.setTmpDHLength(JI)V+0
j io.grpc.netty.shaded.io.netty.handler.ssl.ReferenceCountedOpenSslContext.<init>(Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/OpenSslApplicationProtocolNegotiator;JJI[Ljava/security/cert/Certificate;Lio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZZ)V+532
j io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslContext.<init>(Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/OpenSslApplicationProtocolNegotiator;JJI[Ljava/security/cert/Certificate;Lio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZ)V+21
j io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslServerContext.<init>([Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/OpenSslApplicationProtocolNegotiator;JJLio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZLjava/lang/String;)V+21
j io.grpc.netty.shaded.io.netty.handler.ssl.OpenSslServerContext.<init>([Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/ApplicationProtocolConfig;JJLio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZLjava/lang/String;)V+33
j io.grpc.netty.shaded.io.netty.handler.ssl.SslContext.newServerContextInternal(Lio/grpc/netty/shaded/io/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/grpc/netty/shaded/io/netty/handler/ssl/CipherSuiteFilter;Lio/grpc/netty/shaded/io/netty/handler/ssl/ApplicationProtocolConfig;JJLio/grpc/netty/shaded/io/netty/handler/ssl/ClientAuth;[Ljava/lang/String;ZZLjava/lang/String;)Lio/grpc/netty/shaded/io/netty/handler/ssl/SslContext;+152
j io.grpc.netty.shaded.io.netty.handler.ssl.SslContextBuilder.build()Lio/grpc/netty/shaded/io/netty/handler/ssl/SslContext;+79
... java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x7707ba] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xe3a
V [libjvm.so+0x76dd5b] JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x28b
V [libjvm.so+0x76e347] JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x57
V [libjvm.so+0x7b912b] thread_entry(JavaThread*, Thread*)+0x7b
V [libjvm.so+0xc1f583] JavaThread::thread_main_inner()+0x103
V [libjvm.so+0xc1f8d8] JavaThread::run()+0x328
V [libjvm.so+0xa09702] java_start(Thread*)+0x112
C [libpthread.so.0+0x7e15] start_thread+0xc5
2. 问题分析
-
分析JDK中系统属性
jdk.tls.ephemeralDHKeySize
大小限制查看JDK源码中系统属性
jdk.tls.ephemeralDHKeySize
大小限制,可以看出该系统属性大小的范围是[1024,8192],并且是 64 的倍数。 -
分析hs_err_pid日志文件
从hs_err_pid日志文件中查看错误日志调用堆栈,发现触发SIGSEGV的是
io.grpc.netty.shaded.io.netty.internal.tcnative.SSLContext.setTmpDHLength
方法,这是 netty-tcnative 的一个native方法,第二个参数 length 大小为系统属性jdk.tls.ephemeralDHKeySize
配置的大小。 -
分析 netty-tcnative 中系统属性
jdk.tls.ephemeralDHKeySize
大小限制从github中找到 netty-tcnative 相关源码,发现该方法进一步对keySize大小进行限制,只支持 512、1024、2048、4096,配置无效的keySize会抛出Exception。
-
分析crash产生的根因
即使设置了
jdk.tls.ephemeralDHKeySize=3072
,正常现象应该是抛出java.lang.Exception: Unsupported length 3072
,为什么进程却直接crash了?我们进一步分析下源码,可以发现 639 行的 tcn_Throw 函数参数格式和参数存在不匹配的问题,参数是个int类型,而格式设置成 %s,从而导致crash。查看netty社区可以发现 netty-tcnative-boringssl-static 2.0.57.Final 版本修复了配置无效的keySize出现crash问题,具体修复代码[1]如下所示:
3. 复现方法
-
maven 依赖
<dependencies>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.82.Final</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-tcnative-boringssl-static</artifactId>
<version>2.0.54.Final</version>
<classifier>${os.detected.classifier}</classifier>
</dependency>
</dependencies>
<build>
<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.4.0.Final</version>
</extension>
</extensions>
</build> -
生成私钥和证书
openssl genrsa -out rsa.key
openssl req -new -key rsa.key -subj "/C=China/ST=Beijing/L=Beijing" -out rsa.csr
openssl x509 -req -days 3650 -in rsa.csr -signkey rsa.key -out rsa.crt
openssl pkcs8 -topk8 -inform PEM -in rsa.key -outform pem -out rsa_enc_pkcs8.key -v1 PBE-SHA1-3DES -passin pass:12345678 -passout pass:12345678 -
复现用例
import io.netty.handler.ssl.SslContext;
import io.netty.handler.ssl.SslContextBuilder;
import javax.net.ssl.SSLException;
import java.io.File;
public class SslContextBuilderTest {
public static void main(String[] args) {
System.setProperty("jdk.tls.ephemeralDHKeySize", "3072");
File keyCertChainFile = new File("rsa.crt");
File keyFile = new File("rsa_enc_pkcs8.key");
SslContextBuilder sslContextBuilder = SslContextBuilder.forServer(
keyCertChainFile, keyFile, "12345678");
try {
SslContext sslContext = sslContextBuilder.build();
} catch (SSLException e) {
throw new RuntimeException(e);
}
}
}
总结
-
JDK的 ephemeralDHKeySize 大小的限制是[1024,8192],并且是 64 的倍数,支持 3072。而在netty-tcnative时,要注意 ephemeralDHKeySize 是不支持 3072,只支持 512、1024、2048、4096。 -
如果使用的 netty-tcnative-boringssl-static 版本低于 2.0.57.Final,设置无效的 ephemeralDHKeySize 会导致进程SIGSEGV crash。
参考
-
https://github.com/netty/netty-tcnative/pull/759/commits/eecaaa8e4222de1af05f9ccda0324b7c50955c97
Compiler SIG 专注于编译器领域技术交流探讨和分享,包括 GCC/LLVM/OpenJDK 以及其他的程序优化技术,聚集编译技术领域的学者、专家、学术等同行,共同推进编译相关技术的发展。
扫码添加 SIG 小助手微信,邀请你进 Compiler SIG 微信交流群。