性能文章>官方文档竟然有坑!关于G1参数InitiatingHeapOccupancyPercent的正确认知 #我在性能调优路上的打怪日记#>

官方文档竟然有坑!关于G1参数InitiatingHeapOccupancyPercent的正确认知 #我在性能调优路上的打怪日记#原创

https://a.perfma.net/img/2382850
3年前
18298611

问题

前两天,一个群友在群中提出一个疑问:

G1里的XX:InitiatingHeapOccupancyPercent,默认是45。他看网上有两种说法,一种是整个堆占用率超过45%时开始并发标记周期;另一种说是old region占用超过45%时开始并发标记周期;

正好我也疑惑这个问题,可以去做一个探究。

参数简介

InitiatingHeapOccupancyPercent,简称IHOP。我们都知道在G1中,主要的收集方式是Minor GC(回收整个年轻代Young Region)和Mixed GC(回收整个年轻代Young Region & 部分老年代Old Region)。

什么?你跟我说还有Full GC。其实在G1设计之初,Oracle认为G1依托Minor GC和Mixed GC就够了,如果你出现了Full GC那说明你的参数设置的不正确。所以在早期版本(JDK10之前)的G1实现中,Oracle只设计了串行的Full GC来擦**(单线程的Mark-Compact)。

但是程序总是复杂的,尽管我们极力避免,在一些特殊的情况下如并发回收的速度跟不上分配的速度等,我们依然会面临Full GC的局面,所以此时G1的Full GC那是极其的拉胯。在这种情况下,Google实现了一种多行程并行的Full GC方式,并向社区提了patch。但是Oracle因为之前和Google关于**社区的争端直接kill了这个patch,然后自己提了一个JEP 307的提案,为G1设计了并行的Full GC(多线程的Mark-Compact),Full GC的线程数量可以由-XX:ParallelGCThreads来控制(由名字相信大家就可以看出,此处重用了Parallel GC的大部分逻辑)并将这个改动加入到了JDK10之中。

跑题跑远了,赶紧拉回来。

在YGC(或者说Minor GC)之后,如果满足一定的条件,那么并发线程就会进行并发标记(可以简单的理解为要开始Mixed GC了)。而这个条件,就是上文中提到的InitiatingHeapOccupancyPercent(默认45%),即我们已经分配的内存+即将分配的内存超过内存总容量的45%,我们就开始并发标记。

而我们疑惑的正是这个参数,到底是整个堆的内存使用超过总容量的45%,还是老年代的内存使用超过总容量的45%?

先康一康官方文档的描述,只看LTS长期支持的两个版本JDK8和JDK11:

  • JDK8文档

    -XX:InitiatingHeapOccupancyPercent=percent

    Sets the percentage of the heap occupancy (0 to 100) at which to start a concurrent GC cycle. It is used by garbage collectors that trigger a concurrent GC cycle based on the occupancy of the entire heap, not just one of the generations (for example, the G1 garbage collector).

    By default, the initiating value is set to 45%. A value of 0 implies nonstop GC cycles. The following example shows how to set the initiating heap occupancy to 75%:-XX:InitiatingHeapOccupancyPercent=75

    看文档的描述,entire heap那看来是整个堆了,那这样看第一种说法是正确的呀(然而并没有那么简单)。

  • JDK11文档

    -XX:InitiatingHeapOccupancyPercent=n

    Sets the Java heap occupancy threshold that triggers a marking cycle. The default occupancy is 45 percent of the entire Java heap.

    看11的文档的描述,同样也是entire Java heap。

那这样看来,第一种说法似乎是正确的,但是为什么又会有第二种说法呢?不如我们直接去看看源码一探究竟吧。

源码验证

首先,我们来看下Hotspot中全局参数设置中对InitiatingHeapOccupancyPercent的注释:

  • JDK8(8u192-b12)

    //globals.hpp 
    
    product(uintx, InitiatingHeapOccupancyPercent, 45,                        \
              "Percentage of the (entire) heap occupancy to start a "           \
              "concurrent GC cycle. It is used by GCs that trigger a "          \
              "concurrent GC cycle based on the occupancy of the entire heap, " \
              "not just one of the generations (e.g., G1). A value of 0 "       \
              "denotes 'do constant GC cycles'.")
    

    注释中写的确实是整个堆。

  • JDK11

    //gc_globals.hpp 
    
    product(uintx, InitiatingHeapOccupancyPercent, 45,                        \
              "The percent occupancy (IHOP) of the current old generation "     \
              "capacity above which a concurrent mark cycle will be initiated " \
              "Its value may change over time if adaptive IHOP is enabled, "    \
              "otherwise the value remains constant. "                          \
              "In the latter case a value of 0 will result as frequent as "     \
              "possible concurrent marking cycles. A value of 100 disables "    \
              "concurrent marking. "                                            \
              "Fragmentation waste in the old generation is not considered "    \
              "free space in this calculation. (G1 collector only)")            \
              range(0, 100)
    

    这时矛盾点就来了,11的注释中写的是old generation老年代所占的比例,而11的官方文档中写的是entire heap整个堆,难道是官方文档不可信?那,注释就可信吗?

此时我们已经注意到了矛盾的地方,索性就去翻看逻辑的具体实现。

翻看JDK8u192-b12的G1具体实现,我发现,在尝试分配内存时(无论是G1CollectedHeap::attempt_allocation_humongous还是G1CollectedHeap::attempt_allocation_at_safepoint),都会去做一个判断need_to_start_conc_mark

HeapWord* G1CollectedHeap::attempt_allocation_humongous(size_t word_size,
                                                        uint* gc_count_before_ret,
                                                        uint* gclocker_retry_count_ret) {
  .......

  // Humongous objects can exhaust the heap quickly, so we should check if we
  // need to start a marking cycle at each humongous object allocation. We do
  // the check before we do the actual allocation. The reason for doing it
  // before the allocation is that we avoid having to keep track of the newly
  // allocated memory while we do a GC.
  if (g1_policy()->need_to_start_conc_mark("concurrent humongous allocation",
                                           word_size)) {
    collect(GCCause::_g1_humongous_allocation);
  }

  .......

}

HeapWord* G1CollectedHeap::attempt_allocation_at_safepoint(size_t word_size,
                                                           AllocationContext_t context,
                                                           bool expect_null_mutator_alloc_region) {
  assert_at_safepoint(true /* should_be_vm_thread */);
  assert(_allocator->mutator_alloc_region(context)->get() == NULL ||
                                             !expect_null_mutator_alloc_region,
         "the current alloc region was unexpectedly found to be non-NULL");

  if (!isHumongous(word_size)) {
    return _allocator->mutator_alloc_region(context)->attempt_allocation_locked(word_size,
                                                      false /* bot_updates */);
  } else {
    HeapWord* result = humongous_obj_allocate(word_size, context);
    if (result != NULL && g1_policy()->need_to_start_conc_mark("STW humongous allocation")) {
      g1_policy()->set_initiate_conc_mark_if_possible();
    }
    return result;
  }

  ShouldNotReachHere();
}

need_to_start_conc_mark方法正是判断是否开启并发标记的函数,让我们看看具体实现:

bool G1CollectorPolicy::need_to_start_conc_mark(const char* source, size_t alloc_word_size) {
  if (_g1->concurrent_mark()->cmThread()->during_cycle()) {
    return false;
  }

  size_t marking_initiating_used_threshold =
    (_g1->capacity() / 100) * InitiatingHeapOccupancyPercent;
  size_t cur_used_bytes = _g1->non_young_capacity_bytes();
  size_t alloc_byte_size = alloc_word_size * HeapWordSize;

  if ((cur_used_bytes + alloc_byte_size) > marking_initiating_used_threshold) {
    if (gcs_are_young() && !_last_young_gc) {
      ergo_verbose5(ErgoConcCycles,
        "request concurrent cycle initiation",
        ergo_format_reason("occupancy higher than threshold")
        ergo_format_byte("occupancy")
        ergo_format_byte("allocation request")
        ergo_format_byte_perc("threshold")
        ergo_format_str("source"),
        cur_used_bytes,
        alloc_byte_size,
        marking_initiating_used_threshold,
        (double) InitiatingHeapOccupancyPercent,
        source);
      return true;
    } else {
      ergo_verbose5(ErgoConcCycles,
        "do not request concurrent cycle initiation",
        ergo_format_reason("still doing mixed collections")
        ergo_format_byte("occupancy")
        ergo_format_byte("allocation request")
        ergo_format_byte_perc("threshold")
        ergo_format_str("source"),
        cur_used_bytes,
        alloc_byte_size,
        marking_initiating_used_threshold,
        (double) InitiatingHeapOccupancyPercent,
        source);
    }
  }

  return false;
}

很直观的可以看出,如果满足(cur_used_bytes + alloc_byte_size) > marking_initiating_used_threshold,那么我们就会开启并发标记。
其中marking_initiating_used_threshold是什么呢,size_t marking_initiating_used_threshold = (_g1->capacity() / 100) * InitiatingHeapOccupancyPercent;这个值正是整个heap的值 * InitiatingHeapOccupancyPercent %,即整个堆大小的45%(默认,如果没改过InitiatingHeapOccupancyPercent的话)。
(cur_used_bytes + alloc_byte_size)则正对应我们文章上面说到的已经分配的内存+即将分配的内存,我们需要关注的重点就是cur_used_bytes,size_t cur_used_bytes = _g1->non_young_capacity_bytes();查看non_young_capacity_bytes()方法:

size_t non_young_capacity_bytes() {
    return _old_set.total_capacity_bytes() + _humongous_set.total_capacity_bytes();
  }

咦,这分明就是计算的Old Region的大小啊,难道说JDK8的官方文档也写错了,8和11一样都计算的是老年代所占总容量的大小。那为什么官方文档还写entire heap呢,好奇怪。

正当我正暗自纳闷的时候,机智的群友从R大的文章里面找到了答案:
R大yyds
我打开了JDK-6976060,发现这个Enhancement增强Resolved In Build:b12,即在8b12版本中,这个参数得到了修正。
我找到JDK8b11的代码(即修改之前的代码),翻看相关的逻辑发现:

void G1CollectorPolicy::record_collection_pause_end() {

  .......

  size_t cur_used_bytes = _g1->used();
  
  .......

  size_t marking_initiating_used_threshold =
    (_g1->capacity() / 100) * InitiatingHeapOccupancyPercent;

  if (!_g1->mark_in_progress() && !_last_full_young_gc) {
    assert(!last_pause_included_initial_mark, "invariant");
    if (cur_used_bytes > marking_initiating_used_threshold) {
      if (cur_used_bytes > _prev_collection_pause_used_at_end_bytes) {
        assert(!during_initial_mark_pause(), "we should not see this here");

        ergo_verbose3(ErgoConcCycles,
                      "request concurrent cycle initiation",
                      ergo_format_reason("occupancy higher than threshold")
                      ergo_format_byte("occupancy")
                      ergo_format_byte_perc("threshold"),
                      cur_used_bytes,
                      marking_initiating_used_threshold,
                      (double) InitiatingHeapOccupancyPercent);

        // Note: this might have already been set, if during the last
        // pause we decided to start a cycle but at the beginning of
        // this pause we decided to postpone it. That's OK.
        set_initiate_conc_mark_if_possible();
      } else {
        ergo_verbose2(ErgoConcCycles,
                  "do not request concurrent cycle initiation",
                  ergo_format_reason("occupancy lower than previous occupancy")
                  ergo_format_byte("occupancy")
                  ergo_format_byte("previous occupancy"),
                  cur_used_bytes,
                  _prev_collection_pause_used_at_end_bytes);
      }
    }
  }

  ........
}

cur_used_bytes > marking_initiating_used_threshold的逻辑中,marking_initiating_used_threshold的计算方式没有发生改变,但是核心的cur_used_bytes的计算方式却发生了变化,变成了size_t cur_used_bytes = _g1->used();,继续查看used()方法:

// Computes the sum of the storage used by the various regions.

size_t G1CollectedHeap::used() const {
  assert(Heap_lock->owner() != NULL,
         "Should be owned on this thread's behalf.");
  size_t result = _summary_bytes_used;
  // Read only once in case it is set to NULL concurrently
  HeapRegion* hr = _mutator_alloc_region.get();
  if (hr != NULL)
    result += hr->used();
  return result;
}

继续查找_summary_bytes_used的逻辑会发现:_summary_bytes_used = recalculate_used();,而recalculate_used():

size_t G1CollectedHeap::recalculate_used() const {
  SumUsedClosure blk;
  heap_region_iterate(&blk);
  return blk.result();
}

发现它遍历了heap中所有已经使用的region来进行计算,所以最后cur_used_bytes计算所得到的值正是如文档所说的entire heap(Young Region + Old Region)的值,也和群友疑惑的第一种说法是一样的。

至于Oracle为啥做出这样的变动,我想R大已经描述的很好了:改变之后,G1触发global concurrent marking的条件变得更加关心old gen什么时候会变得无法扩张,而不只是简单的看整堆剩余容量。毕竟global concurrent marking的目的是为了让G1 mixed GC可以找出适合的old gen region来收集,必须在old gen变得无法扩张(也就基本无法收集)之前完成marking。

R大YYDS!看来官方文档其实一直描述的是JDK8b12改动之前的逻辑,而且这个错误的描述一直延续到了JDK11,所以网上的这两种说法都即正确又不完全正确(在不指定JDK版本的情况下说正确与否都是耍流氓)。

结论

如果你使用的JDK版本在8b12之前,那么文章开头的第一种说法是正确的,即XX:InitiatingHeapOccupancyPercent是整个堆使用量与堆总体容量的比值;
如果你使用的JDK版本在8b12之后(包括大版本9、10、11…),那么文章开头第二种说法是正确的,即XX:InitiatingHeapOccupancyPercent是老年代大小与堆总体容量的比值。

PS:Oracle文档真坑.

点赞收藏
豆大侠

一只菜鸡.

请先登录,查看6条精彩评论吧
快去登录吧,你将获得
  • 浏览更多精彩评论
  • 和开发者讨论交流,共同进步

为你推荐

随机一门技术分享之Netty

随机一门技术分享之Netty

从 Linux 内核角度探秘 JDK MappedByteBuffer

从 Linux 内核角度探秘 JDK MappedByteBuffer

MappedByteBuffer VS FileChannel:从内核层面对比两者的性能差异

MappedByteBuffer VS FileChannel:从内核层面对比两者的性能差异

11
6