前言
一开始发现测试环境报错,原先配置6现在配置20依然还是很频繁的报错,想看下底层到底如何处理的导致这个问题,到底什么情况。
排查
看了下日志连接数大量的空闲
看日志活跃的却是满的疑惑
2023-07-18 13:17:15.258 [xxl-job, admin JobFailMonitorHelper] [] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-2 - Timeout failure stats (total=20, active=20, idle=0, waiting=0) 2023-07-18 13:17:15.258 [xxl-job, admin JobFailMonitorHelper] [] INFO com.timelinecapital.oes.persist.core.DAO - Exception: Sql error: java.sql.SQLTransientConnectionException: HikariPool-2 - Connection is not available, request timed out after 10000ms. at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696) ~[HikariCP-4.0.3.jar:?] at
正文
首先从连接开始com.zaxxer.hikari.HikariDataSource#getConnection()
顺着断点看到连接在com.zaxxer.hikari.util.SuspendResumeLock#acquire找到一个很熟悉的并发工具类java.util.concurrent.Semaphore
接着找到了获取连接的方法
com.zaxxer.hikari.util.ConcurrentBag#borrow
其实不用太关注具体实现,只需要知道他是去找一个连接数即可,由浅入深,不用在意细节,否则会在代码里出不来。
这里要提到一个特别的类型com.zaxxer.hikari.util.ConcurrentBag.IConcurrentBagEntry用来标记连接是否可用是否使用,这很重要。
public interface IConcurrentBagEntry{ int STATE_NOT_IN_USE = 0; int STATE_IN_USE = 1; int STATE_REMOVED = -1; int STATE_RESERVED = -2; boolean compareAndSet(int expectState, int newState); void setState(int newState); int getState(); }
接着就又看到熟悉的CAS了,比较替换,果然并发工具类都是想通的,看到了6个连接数都是使用状态,这就奇怪了。
java.util.concurrent.atomic.AtomicIntegerFieldUpdater.AtomicIntegerFieldUpdaterImpl#compareAndSet
private final CopyOnWriteArrayList<T> sharedList
既然是已经提交了但是没有SQL查询的情况会不会是这个状态标记没有转换过来,接着找他们标记
com.zaxxer.hikari.pool.HikariPool#getConnection(long)
if (bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
if (bagEntry == null || bagEntry.compareAndSet(STATE_NOT_IN_USE, STATE_IN_USE)) {
在测试的时候发现一个神奇的点HikariPool-2 - Timeout failure stats (total=6, active=6, idle=0, waiting=1)活跃的连接不释放。
接着测试一下是否自动事物配置属性问题,发现也还是一样。
Connection connection = null; try { connection = DAO.getConnection(); // connection.setAutoCommit(false);
2023-07-19 09:34:15.067 [] DEBUG 3188 --- [l-1 housekeeper] c.z.h.p.HikariPool :421 HikariPool-1 - Before cleanup stats (total=3, active=2, idle=1, waiting=0)
2023-07-19 09:34:15.068 [] DEBUG 3188 --- [l-1 housekeeper] c.z.h.p.HikariPool :421 HikariPool-1 - After cleanup stats (total=3, active=2, idle=1, waiting=0)
2023-07-19 09:34:15.068 [] DEBUG 3188 --- [l-1 housekeeper] c.z.h.p.HikariPool :518 HikariPool-1 - Fill pool skipped, pool is at sufficient level.
2023-07-19 09:34:15.068 [] DEBUG 3188 --- [l-2 housekeeper] c.z.h.p.HikariPool :421 HikariPool-2 - Before cleanup stats (total=0, active=0, idle=0, waiting=0)
2023-07-19 09:34:15.068 [] DEBUG 3188 --- [l-2 housekeeper] c.z.h.p.HikariPool :421 HikariPool-2 - After cleanup stats (total=0, active=0, idle=0, waiting=0)
2023-07-19 09:34:15.069 [] DEBUG 3188 --- [l-2 housekeeper] c.z.h.p.HikariPool :518 HikariPool-2 - Fill pool skipped, pool is at sufficient level.
依然如此
2023-07-19 09:42:39.015 [] DEBUG 3609 --- [#scheduleThread] c.z.h.p.HikariPool :421 HikariPool-2 - Timeout failure stats (total=6, active=6, idle=0, waiting=1)
2023-07-19 09:42:39.016 [] INFO 3609 --- [#scheduleThread] c.t.o.p.c.DAO :222 Exception: Sql error:
java.sql.SQLTransientConnectionException: HikariPool-2 - Connection is not available, request timed out after 10004ms.
突然想到会不会是我连接没有释放!!!
finally { connection.close(); }
果然这回活跃数量瞬间下去了,执行完成之后就释放了,这回正常了
2023-07-19 10:51:11.566 [] DEBUG 5172 --- [l-2 housekeeper] c.z.h.p.HikariPool :421 HikariPool-2 - Before cleanup stats (total=1, active=0, idle=1, waiting=0)
2023-07-19 10:51:11.566 [] DEBUG 5172 --- [l-1 housekeeper] c.z.h.p.HikariPool :421 HikariPool-1 - Before cleanup stats (total=3, active=0, idle=3, waiting=0)
2023-07-19 10:51:11.566 [] DEBUG 5172 --- [l-1 housekeeper] c.z.h.p.HikariPool :421 HikariPool-1 - After cleanup stats (total=3, active=0, idle=3, waiting=0)
2023-07-19 10:51:11.566 [] DEBUG 5172 --- [l-2 housekeeper] c.z.h.p.HikariPool :421 HikariPool-2 - After cleanup stats (total=1, active=0, idle=1, waiting=0)
2023-07-19 10:51:11.566 [] DEBUG 5172 --- [l-1 housekeeper] c.z.h.p.HikariPool :518 HikariPool-1 - Fill pool skipped, pool is at sufficient level.
2023-07-19 10:51:11.566 [] DEBUG 5172 --- [l-2 housekeeper] c.z.h.p.HikariPool :518 HikariPool-2 - Fill pool skipped, pool is at sufficient level.
顺着关闭连接的方法找到了代理类
com.zaxxer.hikari.pool.ProxyConnection#close
有找到了释放连接的方法,这方法只需要关注一个点bagEntry.setState(STATE_NOT_IN_USE);标记未使用,哈哈到这里估计就明白了。
com.zaxxer.hikari.util.ConcurrentBag#requite
/**
* This method will return a borrowed object to the bag. Objects
* that are borrowed from the bag but never "requited" will result
* in a memory leak.
*
* @param bagEntry the value to return to the bag
* @throws NullPointerException if value is null
* @throws IllegalStateException if the bagEntry was not borrowed from the bag
*/
public void requite(final T bagEntry){
bagEntry.setState(STATE_NOT_IN_USE);
for (int i = 0; waiters.get() > 0; i++) {
if (bagEntry.getState() != STATE_NOT_IN_USE || handoffQueue.offer(bagEntry)) {
return;
} else if ((i & 0xff) == 0xff) {
parkNanos(MICROSECONDS.toNanos(10));
} else {
yield();
}
}
final List<Object> threadLocalList = threadList.get();
threadLocalList.add(weakThreadLocals ? new WeakReference<>(bagEntry) : bagEntry);
}
这里就是有一个点疑惑就是这个类com.zaxxer.hikari.pool.PoolEntry我觉得这里面有很大的探索空间,后续再慢慢挖掘哈哈,这次先解决错误。
/**
* Entry used in the ConcurrentBag to track Connection instances.
*
* @author Brett Wooldridge
*/
final class PoolEntry implements IConcurrentBagEntry{
归纳了解决问题的几种方法DAO.getConnection()是我们自己封装的一个类,其实我挺讨厌自己造轮子的,不好用还容易出问题,从1到100不好么,为什么老是喜欢从零到一,很容易掉坑了
// 1 try会自动释放连接
Connection connection = DAO.getConnection();
try (connection) {} catch (Exception e) {}
// 2
try (Connection connection = DAO.getConnection()) {} catch (Exception e) {}
// 3 在最好关闭连接
Connection connection = DAO.getConnection();
try{
} catch (Exception e) {
}finally {
connection.close();
}
总结
其实写过SQL连接的都知道要打开要关闭,我也知道的,习惯了封装的类觉得会考虑到这个点,所以就没有关注到这里就没有关心了。又回到上面说的,真不喜欢造的轮子需要注意的点太多了,流行的多好,所有考虑的点都给你处理好了,拆箱即用。这里封装封一半,我以为自己会释放连接,没想到还需要自己手动控制,我。。。。。。不过好在慢慢细心排查,找到了问题所在。