性能问答>CMS collector没有按照CMSWaitDuration触发>
7回复

CMS collector没有按照CMSWaitDuration触发



背景
本打算验证一下CMSWaitDuration参数,但是发现CMS collector在CMSWaitDuration之前提前触发了。

jvm版本信息
image.png

jvm启动参数

java -verbose:gc -Xms20M -Xmx20M -Xmn5M -XX:+PrintGCDetails -XX:+PrintGC -XX:+PrintGCTimeStamps  -XX:Survi
vorRatio=8 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80  -XX:+UseConcMarkSweepGC -XX:+PrintTenuringDistribution  -XX:+PrintHea
pAtGC  -XX:+PrintGC -XX:+PrintCMSInitiationStatistics -XX:CMSWaitDuration=80000  -XX:+Verbose

测试程序

public static void main(String[] args) throws InterruptedException {
        List<byte[]> list = new ArrayList<>(3);
        list.add(new byte[12*1024*1024]);//大对象直接分配到老年代,老年代当前占用12m,剩余3m
        Thread.sleep(3000);
        list.add(new byte[1*1024*1024]);//分配到年轻代,年轻代当前占用1m+
        Thread.sleep(2000);
        list.add(new byte[3*1024*1024]);//年轻代gc,1m+内容晋升到老年代,现在年轻代占用3m+,老年代占用13m+
        Thread.sleep(5000);
   }

gc log

MarkStackSize: 4096k  MarkStackSizeMax: 524288k
ConcGCThreads: 0
Minimum heap 20971520  Initial heap 20971520  Maximum heap 20971520
1: Minimum gen0 5242880  Initial gen0 5242880  Maximum gen0 5242880
Minimum gen1 15728640  Initial gen1 15728640  Maximum gen1 15728640

TwoGenerationCollectorPolicy::mem_allocate_work: attempting locked slow path allocation
DefNewGeneration::allocate_from_space(1572866):  will_fail: false  heap_lock: locked  free: 524288  should_allocate_from_space: NOT
  returns NULL
TwoGenerationCollectorPolicy::mem_allocate_work: attempting locked slow path allocation
DefNewGeneration::allocate_from_space(393218):  will_fail: false  heap_lock: locked  free: 524288  should_allocate_from_space: NOT
  returns NULL
{Heap before GC invocations=0 (full 0):
 par new generation   total 4608K, used 1891K [0x00000000fec00000, 0x00000000ff100000, 0x00000000ff100000)
  eden space 4096K,  46% used [0x00000000fec00000, 0x00000000fedd8e18, 0x00000000ff000000)
  from space 512K,   0% used [0x00000000ff000000, 0x00000000ff000000, 0x00000000ff080000)
  to   space 512K,   0% used [0x00000000ff080000, 0x00000000ff080000, 0x00000000ff100000)
 concurrent mark-sweep generation total 15360K, used 12288K [0x00000000ff100000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 2914K, capacity 4486K, committed 4864K, reserved 1056768K
  class space    used 276K, capacity 386K, committed 512K, reserved 1048576K
5.220: [GC (Allocation Failure) Allocated 1 objects, 12582928 bytes concurrently5.220: [ParNewlevel=0 invoke=1 size=3145744CMS: promo attempt is safe: available(3145712) >= av_promo(0),max_promo(1936920)

Desired survivor size 262144 bytes, new threshold 1 (max 15)
- age   1:     518968 bytes,     518968 total
: 1936920->523176(4718592), 0.0104102 secs] 14519848->14352088(20447232)Promoted 2611 objects, 1227120 bytes Contiguous available 1636768 bytes , 0.0106530 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
Heap after GC invocations=1 (full 0):
 par new generation   total 4608K, used 510K [0x00000000fec00000, 0x00000000ff100000, 0x00000000ff100000)
  eden space 4096K,   0% used [0x00000000fec00000, 0x00000000fec00000, 0x00000000ff000000)
  from space 512K,  99% used [0x00000000ff080000, 0x00000000ff0ffba8, 0x00000000ff100000)
  to   space 512K,   0% used [0x00000000ff000000, 0x00000000ff000000, 0x00000000ff080000)
 concurrent mark-sweep generation total 15360K, used 13504K [0x00000000ff100000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 2914K, capacity 4486K, committed 4864K, reserved 1056768K
  class space    used 276K, capacity 386K, committed 512K, reserved 1048576K
}
 CMS: collect because of occupancy 0.879219 / 0.800000  CMS old gen initiated
5.231: [GC (CMS Initial Mark) 5.232: [
checkpointRootsInitialWork, 0.0015963 secs]
[1 CMS-initial-mark: 13828912(15728640)] 17497832(20447232), 0.0018602 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
5.233: [CMS-concurrent-mark-start]
5.239: [CMS-concurrent-mark: 0.005/0.005 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
5.239: [CMS-concurrent-preclean-start]
 (modUnionTable: 0 cards)5.239: [CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
5.239: [CMS-concurrent-abortable-preclean-start]
CMS: promo attempt is safe: available(1899728) >= av_promo(1245984),max_promo(3668920)
CMS: promo attempt is safe: available(1899728) >= av_promo(1245984),max_promo(3668920)
...N次相同的输出
CMS: promo attempt is safe: available(1899728) >= av_promo(1245984),max_promo(3668920)
 (modUnionTable: 0 cards)Heap
 par new generation   total 4608K, used 3665K [0x00000000fec00000, 0x00000000ff100000, 0x00000000ff100000)
  eden space 4096K,  77% used [0x00000000fec00000, 0x00000000fef14930, 0x00000000ff000000)
  from space 512K,  99% used [0x00000000ff080000, 0x00000000ff0ffba8, 0x00000000ff100000)
  to   space 512K,   0% used [0x00000000ff000000, 0x00000000ff000000, 0x00000000ff080000)
 concurrent mark-sweep generation total 15360K, used 13504K [0x00000000ff100000, 0x0000000100000000, 0x0000000100000000)
 Metaspace       used 2921K, capacity 4486K, committed 4864K, reserved 1056768K
  class space    used 277K, capacity 386K, committed 512K, reserved 1048576K
ClassLoaderData CLD: 0x000000001622a4b0, loader: 0x00000000ff080000, loader_klass: 0x000000010000f6a8 sun/misc/Launcher$AppClassLoader { claimed  handles 0x0000000000000000
metaspace: 0x00000000162284b0

Virtual space manager: 0x0000000016225aa0
0) Metachunk: bottom 0x0000000015a20000 top 0x0000000015a204f8 end 0x0000000015a21000 size 512
    used 159 free 353
total of all chunks 512 used 159 free 353 capacity 512 waste 6

Class space manager: 0x0000000016225fa0
0) Metachunk: bottom 0x0000000100060000 top 0x0000000100060220 end 0x0000000100060800 size 256
    used 68 free 188
total of all chunks 256 used 68 free 188 capacity 256 waste 6
jni_method_id count = 1
}
ClassLoaderData CLD: 0x00000000034655d0, loader: 0x0000000000000000, loader_klass: 0x0000000000000000 <bootloader> { claimed  handles 0x0000000016014180
metaspace: 0x00000000034cf190

Virtual space manager: 0x00000000034d9110
0) Metachunk: bottom 0x0000000015620000 top 0x00000000158b4d88 end 0x0000000015a20000 size 524288
    used 338353 free 185935
total of all chunks 524288 used 338353 free 185935 capacity 524288 waste 6

Class space manager: 0x00000000034d91b0
0) Metachunk: bottom 0x0000000100000000 top 0x0000000100045218 end 0x0000000100060000 size 49152
    used 35395 free 13757
total of all chunks 49152 used 35395 free 13757 capacity 49152 waste 6
jni_method_id count = 16
}
All Metaspace:
data space:   Chunk accounting: used in chunks 2644K + unused in chunks 1455K  +  capacity in free chunks 0K = 4100K  capacity in allocated chunks 4100K
class space:   Chunk accounting: used in chunks 277K + unused in chunks 108K  +  capacity in free chunks 0K = 386K  capacity in allocated chunks 386K
Total fragmentation waste (words) doesn't count free space
  data: 0 specialized(s) 0, 1 small(s) 0, 0 medium(s) 0, large count 1
 class: 0 specialized(s) 0, 1 small(s) 0, 0 medium(s) 0, large count 1
[Verifying Threads Heap concurrent mark-sweep generation CMS: promo attempt is safe: available(1899728) >= av_promo(1245984),max_promo(3753176)
par new generation remset SymbolTable StringTable CodeCache SystemDictionary ClassLoaderDataGraph MetaspaceAux JNIHandles C-heap CodeCache Oops ]

疑惑点
按照配置CMSWaitDuration=80000,CMS collector应该是8s触发一次才对,为什么在5.231s的时候就提前触发了呢?

3858 阅读
请先登录,再评论

我猜这个问题原本是想问为什么没到CMSWaitDuration指定的时间就进行了是否启动cms gc的判断。我理解应该是前边的minor gc导致wait_on_cms_lock_for_scavenge停止等待,所以没到指定的时长

1年前
回复 你假笨:

多谢大佬回答,我自己消化下

1年前回复
回复 r=a(1-sinθ):

主要原因是刚发生了一次YGC,在YGC后退出安全点的时候,会有个通知CMS线程的动作,CMS线程会检查此时是否有发生过GC,也就是你设置8s的这个参数,会让CMS线程在等待之前获取一下当前的GC次数,然后开始等待,当YGC结束之后,会通知这个线程继续往下走,此时再会获取一次GC次数的值,如果这个值变化了,就会检查是否要做CMS GC了,而此时因为old的使用率达到了80%,而触发了这次CMS GC,具体代码可以参考下,

void ConcurrentMarkSweepThread::wait_on_cms_lock_for_scavenge(long t_millis) {
  // Wait time in millis or 0 value representing infinite wait for a scavenge
  assert(t_millis >= 0, "Wait time for scavenge should be 0 or positive");

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  double start_time_secs = os::elapsedTime();
  double end_time_secs = start_time_secs + (t_millis / ((double) MILLIUNITS));

  // Total collections count before waiting loop
  unsigned int before_count;
  {
    MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
    before_count = gch->total_collections();
  }

  unsigned int loop_count = 0;

  while(!_should_terminate) {
    double now_time = os::elapsedTime();
    long wait_time_millis;

    if(t_millis != 0) {
      // New wait limit
      wait_time_millis = (long) ((end_time_secs - now_time) * MILLIUNITS);
      if(wait_time_millis <= 0) {
        // Wait time is over
        break;
      }
    } else {
      // No wait limit, wait if necessary forever
      wait_time_millis = 0;
    }

    // Wait until the next event or the remaining timeout
    {
      MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);

      if (_should_terminate || _collector->_full_gc_requested) {
        return;
      }
      set_CMS_flag(CMS_cms_wants_token);   // to provoke notifies
      assert(t_millis == 0 || wait_time_millis > 0, "Sanity");
      CGC_lock->wait(Mutex::_no_safepoint_check_flag, wait_time_millis);
      clear_CMS_flag(CMS_cms_wants_token);
      assert(!CMS_flag_is_set(CMS_cms_has_token | CMS_cms_wants_token),
             "Should not be set");
    }

    // Extra wait time check before entering the heap lock to get the collection count
    if(t_millis != 0 && os::elapsedTime() >= end_time_secs) {
      // Wait time is over
      break;
    }

    // Total collections count after the event
    unsigned int after_count;
    {
      MutexLockerEx hl(Heap_lock, Mutex::_no_safepoint_check_flag);
      after_count = gch->total_collections();
    }

    if(before_count != after_count) {
      // There was a collection - success
      break;
    }

    // Too many loops warning
    if(++loop_count == 0) {
      warning("wait_on_cms_lock_for_scavenge() has looped %u times", loop_count - 1);
    }
  }
}
1年前回复
回复 侯勇:

是的,我是想问这个

1年前回复

CMS: collect because of occupancy 0.879219 / 0.800000 CMS old gen initiated
已经说了原因,old占比达到了80%以上了

1年前
回复 r=a(1-sinθ):

因为你设置了这两个参数-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80

1年前回复
回复 风吹屁屁凉:

我的疑惑是8s以后应该才触发的,为啥5.231s的时候提前触发了

1年前回复