性能问答>concurrent mode failure>
1回复
1年前

concurrent mode failure



最近线上的一个系统经常出现concurrent mode failure报错,经常导致CMS时间过长,造成了进程的假死。
JVM参数如下:
-XX:CMSInitiatingOccupancyFraction=70
-Xmx12288m
-XX:+PrintGCDateStamps
-XX:PermSize=3096m
-XX:MaxNewSize=2048m
-Xloggc:/logs/gc.log
-XX:NewSize=2048m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:SurvivorRatio=3
-XX:NewRatio=2
-XX:MaxPermSize=3096m
-XX:+UseCMSInitiatingOccupancyOnly
-Xms12288m
-XX:ReservedCodeCacheSize=1024m
-XX:HeapDumpPath=/logs/oom.dump
-XX:+HeapDumpOnOutOfMemoryError

GC日志如下:
xxx-03-30T21:40:45.781+0800: 98316.999: [GC 98317.000: [ParNew: 1342466K->53618K(1677760K), 0.1002170 secs] 8491129K->7202396K(12163520K), 0.1007910 secs] [Times: user=1.04 sys=0.00, real=0.10 secs]
xxx-03-30T21:41:08.109+0800: 98339.328: [GC 98339.328: [ParNew: 1311986K->182568K(1677760K), 0.5254370 secs] 8460764K->7331473K(12163520K), 0.5262060 secs] [Times: user=2.02 sys=0.00, real=0.53 secs]
xxx-03-30T21:42:40.661+0800: 98431.880: [GC 98431.880: [ParNew: 1440936K->182884K(1677760K), 0.5020680 secs] 8589841K->7332769K(12163520K), 0.5028780 secs] [Times: user=1.93 sys=0.00, real=0.51 secs]
xxx-03-30T21:44:19.961+0800: 98531.180: [GC 98531.180: [ParNew: 1441252K->127996K(1677760K), 0.2376770 secs] 8591137K->7278449K(12163520K), 0.2383690 secs] [Times: user=1.37 sys=0.00, real=0.24 secs]
xxx-03-30T21:44:57.459+0800: 98568.678: [GC 98568.679: [ParNew: 1386364K->211862K(1677760K), 0.4181130 secs] 8536817K->7363839K(12163520K), 0.4187200 secs] [Times: user=1.22 sys=0.01, real=0.42 secs]
xxx-03-30T21:45:57.414+0800: 98628.633: [GC [1 CMS-initial-mark: 8451325K(10485760K)] 9844764K(12163520K), 1.0250140 secs] [Times: user=0.54 sys=0.42, real=1.02 secs]
xxx-03-30T21:45:58.439+0800: 98629.658: [CMS-concurrent-mark-start]
xxx-03-30T21:46:05.342+0800: 98636.561: [GC 98636.561: [ParNew: 1396759K->64119K(1677760K), 0.0933360 secs]98636.654: [CMSxxx-03-30T21:46:11.080+0800: 98642.299: [CMS-concurrent-mark: 5.711/12.640 secs] [Times: user=22.30 sys=7.51, real=12.64 secs]
(concurrent mode failure): 9751140K->5026406K(10485760K), 40.1795990 secs] 11147433K->5026406K(12163520K), [CMS Perm : 2299119K->2254218K(3170304K)], 40.2736060 secs] [Times: user=55.96 sys=1.42, real=40.27 secs]
xxx-03-30T21:46:46.503+0800: 98677.722: [GC [1 CMS-initial-mark: 8270381K(10485760K)] 8307655K(12163520K), 0.1478880 secs] [Times: user=0.15 sys=0.00, real=0.15 secs]
xxx-03-30T21:46:46.670+0800: 98677.888: [CMS-concurrent-mark-start]
xxx-03-30T21:46:48.442+0800: 98679.661: [GC 98679.661: [ParNew: 42997K->23586K(1677760K), 0.0882050 secs]98679.749: [CMSxxx-03-30T21:46:53.063+0800: 98684.282: [CMS-concurrent-mark: 4.534/6.393 secs] [Times: user=19.61 sys=1.38, real=6.40 secs]
(concurrent mode failure): 9990765K->5007082K(10485760K), 34.5743860 secs] 10033763K->5007082K(12163520K), [CMS Perm : 2254244K->2253965K(3170304K)], 34.6632660 secs] [Times: user=49.17 sys=0.01, real=34.65 secs]
xxx-03-30T21:47:24.024+0800: 98715.243: [GC [1 CMS-initial-mark: 8051418K(10485760K)] 8916477K(12163520K), 0.1443350 secs] [Times: user=0.14 sys=0.00, real=0.15 secs]
xxx-03-30T21:47:24.172+0800: 98715.390: [CMS-concurrent-mark-start]
xxx-03-30T21:47:27.711+0800: 98718.930: [GC 98718.930: [ParNew: 1198296K->70933K(1677760K), 0.2202220 secs] 9249715K->8122352K(12163520K), 0.2209330 secs] [Times: user=1.19 sys=0.00, real=0.22 secs]
xxx-03-30T21:47:29.046+0800: 98720.265: [CMS-concurrent-mark: 4.468/4.875 secs] [Times: user=21.25 sys=0.14, real=4.87 secs]
xxx-03-30T21:47:29.047+0800: 98720.265: [CMS-concurrent-preclean-start]
xxx-03-30T21:47:29.566+0800: 98720.784: [CMS-concurrent-preclean: 0.488/0.519 secs] [Times: user=0.82 sys=0.02, real=0.52 secs]
xxx-03-30T21:47:29.566+0800: 98720.785: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time xxx-03-30T21:47:34.764+0800: 98725.982: [CMS-concurrent-abortable-preclean: 4.566/5.198 secs] [Times: user=4.90 sys=0.03, real=5.20 secs]
xxx-03-30T21:47:34.773+0800: 98725.992: [GC[YG occupancy: 951818 K (1677760 K)]98725.992: [Rescan (parallel) , 0.1207830 secs]98726.113: [weak refs processing, 0.0000490 secs] [1 CMS-remark: 8051418K(10485760K)] 9003237K(12163520K), 0.1210330 secs] [Times: user=1.41 sys=0.00, real=0.12 secs]
xxx-03-30T21:47:34.895+0800: 98726.114: [CMS-concurrent-sweep-start]
xxx-03-30T21:47:40.291+0800: 98731.510: [CMS-concurrent-sweep: 5.373/5.396 secs] [Times: user=5.47 sys=0.01, real=5.40 secs]
xxx-03-30T21:47:40.291+0800: 98731.510: [CMS-concurrent-reset-start]
xxx-03-30T21:47:40.349+0800: 98731.567: [CMS-concurrent-reset: 0.057/0.057 secs] [Times: user=0.06 sys=0.00, real=0.06 secs]

xxx-04-02T01:20:08.354+0800: 284279.573: [GC 284279.574: [ParNew: 1337652K->75749K(1677760K), 0.0937210 secs] 7883164K->6626221K(12163520K), 0.0943830 secs] [Times: user=1.03 sys=0.01, real=0.09 secs]
xxx-04-02T01:40:09.649+0800: 285480.867: [GC 285480.868: [ParNew: 1334117K->78369K(1677760K), 0.0896120 secs] 7884589K->6631266K(12163520K), 0.0901780 secs] [Times: user=1.03 sys=0.00, real=0.09 secs]
xxx-04-02T02:11:09.639+0800: 287340.858: [GC 287340.859: [ParNew: 1234213K->60176K(1677760K), 0.1023670 secs] 7787110K->6615482K(12163520K), 0.1030150 secs] [Times: user=1.17 sys=0.00, real=0.10 secs]
xxx-04-02T02:11:16.016+0800: 287347.235: [GC [1 CMS-initial-mark: 7854654K(10485760K)] 8690881K(12163520K), 0.1933190 secs] [Times: user=0.19 sys=0.00, real=0.20 secs]
xxx-04-02T02:11:16.210+0800: 287347.429: [CMS-concurrent-mark-start]
xxx-04-02T02:11:21.599+0800: 287352.817: [CMS-concurrent-mark: 5.351/5.389 secs] [Times: user=22.24 sys=0.02, real=5.38 secs]
xxx-04-02T02:11:21.599+0800: 287352.818: [CMS-concurrent-preclean-start]
xxx-04-02T02:11:21.634+0800: 287352.853: [CMS-concurrent-preclean: 0.035/0.035 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]
xxx-04-02T02:11:21.634+0800: 287352.853: [CMS-concurrent-abortable-preclean-start]
xxx-04-02T02:11:24.631+0800: 287355.850: [GC 287355.850: [ParNew: 839004K->59535K(1677760K), 0.0875080 secs]287355.938: [CMSxxx-04-02T02:11:24.759+0800: 287355.978: [CMS-concurrent-abortable-preclean: 2.213/3.125 secs] [Times: user=3.31 sys=0.01, real=3.12 secs]
(concurrent mode failure): 9156329K->5052147K(10485760K), 32.4902040 secs] 9993008K->5052147K(12163520K), [CMS Perm : 2292617K->2282743K(3170304K)], 32.5784660 secs] [Times: user=33.48 sys=0.02, real=32.57 secs]
xxx-04-02T02:11:58.590+0800: 287389.809: [GC [1 CMS-initial-mark: 8296122K(10485760K)] 8321839K(12163520K), 0.1610830 secs] [Times: user=0.16 sys=0.00, real=0.16 secs]
xxx-04-02T02:11:58.753+0800: 287389.972: [CMS-concurrent-mark-start]
xxx-04-02T02:11:59.388+0800: 287390.607: [GC 287390.607: [ParNew: 27246K->40743K(1677760K), 0.0920930 secs]287390.699: [CMSxxx-04-02T02:12:03.948+0800: 287395.167: [CMS-concurrent-mark: 4.468/5.195 secs] [Times: user=19.59 sys=0.00, real=5.20 secs]
(concurrent mode failure): 10016506K->5049192K(10485760K), 34.2002910 secs] 10043753K->5049192K(12163520K), [CMS Perm : 2282750K->2282745K(3170304K)], 34.2931080 secs] [Times: user=48.66 sys=0.00, real=34.29 secs]
xxx-04-02T02:12:34.578+0800: 287425.797: [GC [1 CMS-initial-mark: 8093528K(10485760K)] 8886865K(12163520K), 0.1660130 secs] [Times: user=0.17 sys=0.00, real=0.17 secs]
xxx-04-02T02:12:34.746+0800: 287425.965: [CMS-concurrent-mark-start]
xxx-04-02T02:12:39.402+0800: 287430.621: [CMS-concurrent-mark: 4.488/4.655 secs] [Times: user=18.40 sys=0.01, real=4.66 secs]
xxx-04-02T02:12:39.402+0800: 287430.621: [CMS-concurrent-preclean-start]
xxx-04-02T02:12:39.435+0800: 287430.653: [CMS-concurrent-preclean: 0.032/0.033 secs] [Times: user=0.04 sys=0.00, real=0.03 secs]
xxx-04-02T02:12:39.435+0800: 287430.654: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time xxx-04-02T02:12:44.489+0800: 287435.708: [CMS-concurrent-abortable-preclean: 1.024/5.054 secs] [Times: user=1.37 sys=0.04, real=5.05 secs]
xxx-04-02T02:12:44.500+0800: 287435.718: [GC[YG occupancy: 880401 K (1677760 K)]287435.719: [Rescan (parallel) , 0.1152230 secs]287435.834: [weak refs processing, 0.0000450 secs] [1 CMS-remark: 8093528K(10485760K)] 8973930K(12163520K), 0.1155040 secs] [Times: user=1.29 sys=0.00, real=0.12 secs]
xxx-04-02T02:12:44.616+0800: 287435.834: [CMS-concurrent-sweep-start]
xxx-04-02T02:12:49.689+0800: 287440.908: [GC 287440.908: [ParNew: 894211K->10324K(1677760K), 0.0841060 secs] 8987725K->8103838K(12163520K), 0.0846980 secs]

网上说出现这个有2个原因:
1、在执行CMS GC的过程中同时业务线程将对象放入老年代,而此时老年代空间不足
2、在做Minor GC的时候,新生代Survivor空间放不下,需要放入老年代,而老年代也放不下而产生的。

疑问:
我已经配置了-XX:CMSInitiatingOccupancyFraction=70。从日志上看
xxx-04-02T02:11:09.639+0800: 287340.858: [GC 287340.859: [ParNew: 1234213K->60176K(1677760K), 0.1023670 secs] 7787110K->6615482K(12163520K), 0.1030150 secs] [Times: user=1.17 sys=0.00, real=0.10 secs]
xxx-04-02T02:11:16.016+0800: 287347.235: [GC [1 CMS-initial-mark: 7854654K(10485760K)] 8690881K(12163520K), 0.1933190 secs] [Times: user=0.19 sys=0.00, real=0.20 secs]
xxx-04-02T02:11:16.210+0800: 287347.429: [CMS-concurrent-mark-start]
在执行CMS gc前,年老代的还剩余接近5-6G空间,足以放下整个年轻代(2G)。说明这个-XX:CMSInitiatingOccupancyFraction=70是生效的。

难道是跟对象太大有关吗?

2432 阅读
请先登录,再评论

再次查看gc日志,发现每次CMSGC时,都超过了70%的触发CMSGC的阈值,那是不是说明我应该把比例调高,降低发送CMS的频率呢?

1年前