性能文章>G1垃圾回收源码分析(三)>

G1垃圾回收源码分析(三)原创

4年前
9786213

新生代

之前叙述了G1的分区和Rset,这一次来关注一下G1新生代在发生GC的主要处理方式。G1的新生代的分区个数受之前动态计算出得分区的大小影响,如果设置了(MaxNewSize和NewSize)。除以G1推断的分区大小,可以得到新生代的最大分区数和最小分区数。如果同时设置(MaxNewSize和NewSize)和(NewRatio),则忽略(NewRatio)。如果只设置了(NewRatio),那么新生代的大小就是堆内存/(NewRatio—+1)。除以G1推断的分区大小,可以得到新生代的分区数。

如果没有设置(MaxNewSize和NewSize)或者(NewRatio),那么G1会根据独有的参数(G1MaxNewSizePercent=60)和(GNewSizePercent=5)占整个堆的比例来计算最大和最小分区数

新生代最大分区数和最小分区数被G1推断得到的一样的时候,这就意味着新生代不会动态变化,这就意味着在停顿预测的时候可能会无法满足期望值。

void G1YoungGenSizer::recalculate_min_max_young_length(uint number_of_heap_regions, uint* min_young_length, uint* max_young_length) {
assert(number_of_heap_regions > 0, "Heap must be initialized");

switch (_sizer_kind) {
case SizerDefaults:
*min_young_length = calculate_default_min_length(number_of_heap_regions);
*max_young_length = calculate_default_max_length(number_of_heap_regions);
break;
case SizerNewSizeOnly:
*max_young_length = calculate_default_max_length(number_of_heap_regions);
*max_young_length = MAX2(*min_young_length, *max_young_length);
break;
case SizerMaxNewSizeOnly:
*min_young_length = calculate_default_min_length(number_of_heap_regions);
*min_young_length = MIN2(*min_young_length, *max_young_length);
break;
case SizerMaxAndNewSize:
// Do nothing. Values set on the command line, don't update them at runtime.
break;
case SizerNewRatio:
*min_young_length = number_of_heap_regions / (NewRatio + 1);
*max_young_length = *min_young_length;
break;
default:
ShouldNotReachHere();
}

G1动态处理新生代大小,自适应新生代大小扩充。动态计算参见代码G1CollectorPolicy.cpp#expansion_amount如下。

size_t G1CollectorPolicy::expansion_amount() {
double recent_gc_overhead = recent_avg_pause_time_ratio() * 100.0;
double threshold = _gc_overhead_perc;
if (recent_gc_overhead > threshold) {
const size_t min_expand_bytes = 1*M;
size_t reserved_bytes = _g1->max_capacity();
size_t committed_bytes = _g1->capacity();
size_t uncommitted_bytes = reserved_bytes - committed_bytes;
size_t expand_bytes;
size_t expand_bytes_via_pct =
uncommitted_bytes * G1ExpandByPercentOfAvailable / 100;
expand_bytes = MIN2(expand_bytes_via_pct, committed_bytes);
expand_bytes = MAX2(expand_bytes, min_expand_bytes);
expand_bytes = MIN2(expand_bytes, uncommitted_bytes);
return expand_bytes;
} else {
return 0;
}
}

_gc_overhead_perc这个阈值关联参数(GCTimeRatio=9),

GCOverheadPerc

代表GC时间占用的时间和应用时间不超过10%不需要拓展,超过则需要拓展内存。需要扩展的大小和(G1ExpandByPercentOfAvailable=20)相关,把现有空间增加一倍,或者以G1ExpandByPercentOfAvailable设定的可扩展空间的百分比,以较小的为准,以最小扩展为界,最大分配一倍的当前已分配的内存,最小分配1M的内存,如果最小值都难以满足的话,则把剩下的所有空间都分配。触发时机参见代码CollectedHeap.cpp#do_collection_pause_at_safepoint在执行GC垃圾停顿收集的时候触发,最终调用expand方法进行内存扩充。

bool G1CollectedHeap::expand(size_t expand_bytes) {
size_t aligned_expand_bytes = ReservedSpace::page_align_size_up(expand_bytes);
aligned_expand_bytes = align_size_up(aligned_expand_bytes,
HeapRegion::GrainBytes);
ergo_verbose2(ErgoHeapSizing,
"expand the heap",
ergo_format_byte("requested expansion amount")
ergo_format_byte("attempted expansion amount"),
expand_bytes, aligned_expand_bytes);

if (is_maximal_no_gc()) {
ergo_verbose0(ErgoHeapSizing,
"did not expand the heap",
ergo_format_reason("heap already fully expanded"));
return false;
}

uint regions_to_expand = (uint)(aligned_expand_bytes / HeapRegion::GrainBytes);
assert(regions_to_expand > 0, "Must expand by at least one region");

uint expanded_by = _hrm.expand_by(regions_to_expand);

if (expanded_by > 0) {
size_t actual_expand_bytes = expanded_by * HeapRegion::GrainBytes;
assert(actual_expand_bytes <= aligned_expand_bytes, "post-condition");
g1_policy()->record_new_heap_size(num_regions());
} else {
ergo_verbose0(ErgoHeapSizing,
"did not expand the heap",
ergo_format_reason("heap expansion operation failed"));
// The expansion of the virtual storage space was unsuccessful.
// Let's see if it was because we ran out of swap.
if (G1ExitOnExpansionFailure &&
_hrm.available() >= regions_to_expand) {
// We had head room...
vm_exit_out_of_memory(aligned_expand_bytes, OOM_MMAP_ERROR, "G1 heap expansion");
}
}
return regions_to_expand > 0;
}

G1-YGC

我们都知道当新生代剩下的空间不够分配会触发GC垃圾回收,新生代的GC是对部分内存进行垃圾回收,GC时间比较少,分区化的G1堆针对新生代的收集的内存也是不固定的。首先我们明白在进行YGC的时候会进行STW。然后会选择需要收集的CSet,针对新生代而言就是整个新生代分区。然后加入收集任务中,去并行处理引用。引用关系搜索完毕之后,就是进行对象引用回收,处理对象晋升,晋升失败的还原对象头,尝试扩展内存等。G1-YGC工作流程如下

do_collection_pause_at_safepoint
直接进入CollectedHeap.cpp#evacuate_collection_set方法一探其究。下图为并行清理CSet方法的工作流程

EvacuateCollectionSet

  1. 使用G1RootProcessor类去执行根扫描,扫描直接强引用。主要是JVM根和Java根。使用G1ParCopyHelper把对象复制。

    • Java根

      • 类加载器

        深度遍历当前类的加载的所有存活的Klass对象,找到之后复制到Survivor区或者晋升老年代。

      • 线程栈

        处理Java线程栈和本地方法栈中找,通过StackFrameStream的next执行飞到Sender,从而得到调用者,进而其找到关联的活跃堆内对象,将其复制到Survivor区或者晋升老年代。

      知道了G1RootProcessor类会从上述的两个大方向上去找活跃对象,那么直接看代码,g1RootProcessor.cpp#evacuate_roots

      • void G1RootProcessor::process_java_roots(OopClosure* strong_roots,
        CLDClosure* thread_stack_clds,
        CLDClosure* strong_clds,
        CLDClosure* weak_clds,
        CodeBlobClosure* strong_code,
        G1GCPhaseTimes* phase_times,
        uint worker_i) {
        assert(thread_stack_clds == NULL || weak_clds == NULL, "There is overlap between those, only one may be set");
        // Iterating over the CLDG and the Threads are done early to allow us to
        // first process the strong CLDs and nmethods and then, after a barrier,
        // let the thread process the weak CLDs and nmethods.
        {
        G1GCParPhaseTimesTracker x(phase_times, G1GCPhaseTimes::CLDGRoots, worker_i);
        if (!_process_strong_tasks->is_task_claimed(G1RP_PS_ClassLoaderDataGraph_oops_do)) {
        ClassLoaderDataGraph::roots_cld_do(strong_clds, weak_clds);
        }
        }
        
        {
        G1GCParPhaseTimesTracker x(phase_times, G1GCPhaseTimes::ThreadRoots, worker_i);
        Threads::possibly_parallel_oops_do(strong_roots, thread_stack_clds, strong_code);
        }
        }
        
        void ClassLoaderDataGraph::roots_cld_do(CLDClosure* strong, CLDClosure* weak) {
        for (ClassLoaderData* cld = _head; cld != NULL; cld = cld->_next) {
        CLDClosure* closure = cld->keep_alive() ? strong : weak;
        if (closure != NULL) {
        closure->do_cld(cld);
        }
        }
        }
        
        void ClassLoaderData::oops_do(OopClosure* f, KlassClosure* klass_closure, bool must_claim) {
        if (must_claim && !claim()) {
        return;
        }
        
        f->do_oop(&_class_loader);
        _dependencies.oops_do(f);
        _handles->oops_do(f);
        if (klass_closure != NULL) {
        classes_do(klass_closure);
        }
        }
        void ClassLoaderData::classes_do(KlassClosure* klass_closure) {
        for (Klass* k = _klasses; k != NULL; k = k->next_link()) {
        klass_closure->do_klass(k);
        assert(k != k->next_link(), "no loops!");
        }
        }

      最终发现调用的G1KlassScanClosure中的do_klass

      • class G1KlassScanClosure : public KlassClosure {
        G1ParCopyHelper* _closure;
        bool _process_only_dirty;
        int _count;
        public:
        G1KlassScanClosure(G1ParCopyHelper* closure, bool process_only_dirty)
        : _process_only_dirty(process_only_dirty), _closure(closure), _count(0) {}
        void do_klass(Klass* klass) {
        if (!_process_only_dirty || klass->has_modified_oops()) {
        klass->clear_modified_oops();
        _closure->set_scanned_klass(klass);
        klass->oops_do(_closure);
        _closure->set_scanned_klass(NULL);
        }
        _count++;
        }
        };

      主要执行klass->oops_do(_closure);,这个f为G1ParCopyHelper的对象,所以最终调用的g1CollectedHeap.cpp@G1ParCopyClosure#do_oop_workG1ParCopyHelperdo_oop最终调用do_oop_work来把活跃对象复制到新分区。

      针对线程的处理则是在thread.cpp#possibly_parallel_oops_doThreads::possibly_parallel_oops_do(strong_roots, thread_stack_clds, strong_code);实际调用JavaThread::oops_do遍历栈桢

      • void Thread::oops_do(OopClosure* f, CLDClosure* cld_f, CodeBlobClosure* cf) {
        active_handles()->oops_do(f);
        // Do oop for ThreadShadow
        f->do_oop((oop*)&_pending_exception);
        handle_area()->oops_do(f);
        }
        void JavaThread::oops_do(OopClosure* f, CLDClosure* cld_f, CodeBlobClosure* cf) {
        Thread::oops_do(f, cld_f, cf);
        assert( (!has_last_Java_frame() && java_call_counter() == 0) ||
        (has_last_Java_frame() && java_call_counter() > 0), "wrong java_sp info!");
        
        if (has_last_Java_frame()) {
        RememberProcessedThread rpt(this);
        if (_privileged_stack_top != NULL) {
        _privileged_stack_top->oops_do(f);
        }
        if (_array_for_gc != NULL) {
        for (int index = 0; index < _array_for_gc->length(); index++) {
        f->do_oop(_array_for_gc->adr_at(index));
        }
        }
        for (MonitorChunk* chunk = monitor_chunks(); chunk != NULL; chunk = chunk->next()) {
        chunk->oops_do(f);
        }
        for(StackFrameStream fst(this); !fst.is_done(); fst.next()) {
        fst.current()->oops_do(f, cld_f, cf, fst.register_map());
        }
        }
        set_callee_target(NULL);
        assert(vframe_array_head() == NULL, "deopt in progress at a safepoint!");
        GrowableArray* list = deferred_locals();
        if (list != NULL) {
        for (int i = 0; i < list->length(); i++) {
        list->at(i)->oops_do(f);
        }
        }
        f->do_oop((oop*) &_threadObj);
        f->do_oop((oop*) &_vm_result);
        f->do_oop((oop*) &_exception_oop);
        f->do_oop((oop*) &_pending_async_exception);
        
        if (jvmti_thread_state() != NULL) {
        jvmti_thread_state()->oops_do(f);
        }
        }

      从JNI本地代码栈和JVM内部方法栈中找活跃对象,从java栈中找,遍历Monitor块,遍历jvmti(JVM Tool Interface)这里主要使用是JavaAgent。最后执行G1ParCopyHelperdo_oop最终调用do_oop_work来把活跃对象复制到新分区。

    • JVM根

      一些全局JVM对象,如Universe,JNIHandles,SystemDictionary,StringTable等等

      void G1RootProcessor::process_vm_roots(OopClosure* strong_roots,
                                             OopClosure* weak_roots,
                                             G1GCPhaseTimes* phase_times,
                                             uint worker_i) {
      {
          G1GCParPhaseTimesTracker x(phase_times, G1GCPhaseTimes::UniverseRoots, worker_i);
          if (!_process_strong_tasks->is_task_claimed(G1RP_PS_Universe_oops_do)) {
            Universe::oops_do(strong_roots);
          }
        }
       ....
       void Universe::oops_do(OopClosure* f, bool do_all) {
      
        f->do_oop((oop*) &_int_mirror);
        f->do_oop((oop*) &_float_mirror);
        f->do_oop((oop*) &_double_mirror);
       ........
      }

      针对JVM根 同样也是调用的G1ParCopyHelperdo_oop只不过对JVM根而言则是各种全局对象。例如Univers

    g1CollectedHeap.cpp@G1ParCopyClosure#do_oop_work工作流程如下
    do_oop_work
    执行对象复制复制的操作在G1ParScanThreadState#copy_to_survivor_space方法中。具体处理如下
    CopyAndSurvivorSpace

  2. 处理RSet

  • 我们在G1ParTask的work方法中来看处理RSet的入口。

    • void G1RootProcessor::scan_remembered_sets(G1ParPushHeapRSClosure* scan_rs,
      OopClosure* scan_non_heap_weak_roots,
      uint worker_i) {
      ...
      _g1h->g1_rem_set()->oops_into_collection_set_do(scan_rs, &scavenge_cs_nmethods, worker_i);
      }

    主要是去执行G1RemSet中的oops_into_collection_set_do方法。主要信息更新RSet和扫描RSet。

    • void G1RemSet::oops_into_collection_set_do(G1ParPushHeapRSClosure* oc,
      CodeBlobClosure* code_root_cl,
      uint worker_i) {
      DirtyCardQueue into_cset_dcq(&_g1->into_cset_dirty_card_queue_set());
      updateRS(&into_cset_dcq, worker_i);
      scanRS(oc, code_root_cl, worker_i);
      _cset_rs_update_cl[worker_i] = NULL;
      }

    这里看到有个DCQ,在研究RSet的时候就遇到这种队列,当时说的是给予Mutator用于记录应用线程运行时引用情况,这里这个主要是用于记录复制失败后,要保留的引用,此队列数据将传递到用于管理RSet更新的DirtyCardQueueSet。

    • 更新RSet

      主要用于把上面这个DCQ对象存到RSet的PRT当中。

      • G1GCParPhaseTimesTracker x(_g1p->phase_times(), G1GCPhaseTimes::UpdateRS, worker_i);
        // Apply the given closure to all remaining log entries.
        RefineRecordRefsIntoCSCardTableEntryClosure into_cset_update_rs_cl(_g1, into_cset_dcq);
        
        _g1->iterate_dirty_card_closure(&into_cset_update_rs_cl, into_cset_dcq, false, worker_i);
        }
        void G1CollectedHeap::iterate_dirty_card_closure(CardTableEntryClosure* cl,
        DirtyCardQueue* into_cset_dcq,
        bool concurrent,
        uint worker_i) {
        // Clean cards in the hot card cache
        G1HotCardCache* hot_card_cache = _cg1r->hot_card_cache();
        hot_card_cache->drain(worker_i, g1_rem_set(), into_cset_dcq);
        
        DirtyCardQueueSet& dcqs = JavaThread::dirty_card_queue_set();
        size_t n_completed_buffers = 0;
        while (dcqs.apply_closure_to_completed_buffer(cl, worker_i, 0, true)) {
        n_completed_buffers++;
        }
        g1_policy()->phase_times()->record_thread_work_item(G1GCPhaseTimes::UpdateRS, worker_i, n_completed_buffers);
        dcqs.clear_n_completed_buffers();
        assert(!dcqs.completed_buffers_exist_dirty(), "Completed buffers exist!");
        }

      首先使用RefineRecordRefsIntoCSCardTableEntryClosure闭包处理,处理整个卡中如果存在对堆内对象的引用,就是脏卡,就需要入队,被Refine线程处理

      iterate_dirty_card_closure方法处理DCQS中剩余的DCQ,和Java线程处理方式一样。

    • 扫描Rset

      根据Rset中的信息找到引用者

      • void G1RemSet::scanRS(G1ParPushHeapRSClosure* oc,
        CodeBlobClosure* code_root_cl,
        uint worker_i) {
        double rs_time_start = os::elapsedTime();
        HeapRegion *startRegion = _g1->start_cset_region_for_worker(worker_i);
        
        ScanRSClosure scanRScl(oc, code_root_cl, worker_i);
        
        _g1->collection_set_iterate_from(startRegion, &scanRScl);
        scanRScl.set_try_claimed();
        _g1->collection_set_iterate_from(startRegion, &scanRScl);
        
        double scan_rs_time_sec = (os::elapsedTime() - rs_time_start)
        - scanRScl.strong_code_root_scan_time_sec();
        
        assert(_cards_scanned != NULL, "invariant");
        _cards_scanned[worker_i] = scanRScl.cards_done();
        
        _g1p->phase_times()->record_time_secs(G1GCPhaseTimes::ScanRS, worker_i, scan_rs_time_sec);
        _g1p->phase_times()->record_time_secs(G1GCPhaseTimes::CodeRoots, worker_i, scanRScl.strong_code_root_scan_time_sec());
        }

      使用GC线程id分片处理不同的分区,执行流程主要是俩次扫描分区。处理一般对象和代码对象主要处理内联优化之后的代码引用对象。主要执行流程如下
      WX20201106-170743

  1. 对象复制
  • 主要处理根扫描出的对象和 RSet中找到的子对象全部复制到新的分区当中。所有的对象都被放在ParScanState的队列中。执行复制的过程就是从该队列中出队,处理不同的对象类型。最终调用deal_with_reference方法来处理。把cset中所有的活跃对象都复制到新的分区的Survivor或者老年代当中。

     

相关阅读

G1垃圾回收源码分析(一)

G1垃圾回收源码分析(二)

G1垃圾回收源码分析(三)

点赞收藏
分类:标签:
小蓝鲸

奶爸码农

请先登录,查看2条精彩评论吧
快去登录吧,你将获得
  • 浏览更多精彩评论
  • 和开发者讨论交流,共同进步

为你推荐

从 Linux 内核角度探秘 JDK MappedByteBuffer

从 Linux 内核角度探秘 JDK MappedByteBuffer

MappedByteBuffer VS FileChannel:从内核层面对比两者的性能差异

MappedByteBuffer VS FileChannel:从内核层面对比两者的性能差异

13
2