性能文章>字节对齐与Java的指针压缩(上)-字节对齐的渊源>

字节对齐与Java的指针压缩(上)-字节对齐的渊源原创

https://a.perfma.net/img/2382850
3年前
351948

声明:全是个人愚见,如果嗦滴不对请大佬们猛喷。

在上篇文章《从new Class()入手浅看JVM的oop-klass模型》中,我们创建了一个ClassX:

public class ClassX {
	boolean b;
	Object o1;
	int i;
	long l;
	Object o2;
	float f;
}

64位下不开启指针压缩时使用JOL查看:

Running 64-bit HotSpot VM.

Objects are 8 bytes aligned.

Field sizes by type: 8, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

Array element sizes: 8, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

ClassX object internals:
OFFSET  SIZE               TYPE DESCRIPTION                               VALUE
0     4                    (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4     4                    (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8     4                    (object header)                           90 35 20 1c (10010000 00110101 00100000 00011100) (471872912)
12     4                    (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
16     8               long ClassX.l                                  0
24     4                int ClassX.i                                  0
28     4              float ClassX.f                                  0.0
32     1            boolean ClassX.b                                  false
33     7                    (alignment/padding gap)
40     8   java.lang.Object ClassX.o1                                 null
48     8   java.lang.Object ClassX.o2                                 null
Instance size: 56 bytes
Space losses: 7 bytes internal + 0 bytes external = 7 bytes total

ClassX@2ef1e4fad object externals:
ADDRESS       SIZE TYPE   PATH                           VALUE
129ed5378         56 ClassX                                (object)

下面我们开启了Java的指针压缩后,再使用JOL进行查看:

Running 64-bit HotSpot VM.

Using compressed oop with 3-bit shift.

Using compressed klass with 3-bit shift.

Objects are 8 bytes aligned.

Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

ClassX object internals:
OFFSET  SIZE               TYPE DESCRIPTION                               VALUE
0     4                    (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4     4                    (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8     4                    (object header)                           05 c1 00 f8 (00000101 11000001 00000000 11111000) (-134168315)
12     4                int ClassX.i                                  0
16     8               long ClassX.l                                  0
24     4              float ClassX.f                                  0.0
28     1            boolean ClassX.b                                  false
29     3                    (alignment/padding gap)
32     4   java.lang.Object ClassX.o1                                 null
36     4   java.lang.Object ClassX.o2                                 null
Instance size: 40 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total

ClassX@2ef1e4fad object externals:
ADDRESS       SIZE TYPE   PATH                           VALUE
76b5e9a98         40 ClassX                                (object)

我们发现这两者都有一句话:# Objects are 8 bytes aligned.

8 bytes aligned

先说下字节对齐的渊源吧,现代操作系统中,一般都会要求做字节对齐,在著名的《深入理解计算机系统》(《Computer Systems: A Programmer’s Perspective》)中3.9.3 Data Alignment章节中这样描述:

Many computer systems place restrictions on the allowable addresses
for the primitive data types, requiring that the address for some type
of object must be a multiple of some value K (typically 2, 4, or 8).
Such alignment restrictions simplify the design of the hardware
forming the interface between the processor and the memory system. For
example, suppose a processor always fetches 8 bytes from memory with
an address that must be a multiple of 8. If we can guarantee that any
double will be aligned to have its address be a multiple of 8, then
the value can be read or written with a single memory operation.
Otherwise, we may need to perform two memory accesses, since the
object might be split across two 8-byte memory blocks.

说是许多计算机系统对原始数据类型的合法地址进行限制,要求某些类型对象的地址必须是某些值k的倍数(K通常为2,4或8)。 这种对齐限制简化了在处理器和存储器系统之间形成接口的硬件的设计。

但是具体处理器和存储器系统之间形成接口的硬件的设计这块他是没提的,其实CPU32位、64位通常指的是CPU GPRs(General-Purpose Registers,即通用寄存器)的位数大小,这里的32、64并不是指代的地址总线(Address Bus)的位数(32位CPU的地址总线不一定是32位的,有的32位CPU有PAE扩展是36位的,而64位CPU现在并没有真的使用64位来寻址,而是根据实现的不同使用了其中的4X位)。通常意义上CPU的位宽 = CPU内部通用寄存器的位宽 = 数据总线(Data Bus)的位宽,所以在64位CPU上(不论是x86_64还是AMD64等)它的数据总线一般也是64位,这也就导致了设计时一般cache line也是64位。

我看到某乎上有人提到内存对齐最最底层的原因是内存的IO是以8个字节64bit为单位进行的,这个说法我不敢苟同,其实我们现在所用的内存形态是DIMM(Double-Inline Memory Module)(Chip->DIP(Dual In-line Package)->SIMM(Single-Inline Memory Module)->DIMM(Double-Inline Memory Module)演进信息可以阅读老狼大大的内存系列)。而在DDR3时代,内存芯片位宽的配置模式(Configuration)一般有x4,x8,x16三种形式,只有在DDR x8的时候IO才是以8 bytes为单位进行的,而某乎回答中说8颗内存颗粒可以组成1个RANK的说法也只存在于x8模式下,很显然x16模式下只需要有4颗就够了。我们来看一张简单的图:

v2-8228206468a60133393c4c8feb0c6f53_1440w.jpg

从图中我们可以得知:从CPU 到memory芯片的逐层关系为CPU->channel->DIMM->rank->chip->bank->row/column。CPU的双通道channel(当然也有3通道的CPU)连接DIMM,而通常情况下一组channel的位宽为64bit(如果有ECC那就是72bit),所以到这时候我们就能理解 这种对齐限制简化了在处理器和存储器系统之间形成接口的硬件的设计这句话,字节对齐正是对应了CPU通用寄存器位宽->数据总线位宽->cache line位宽->channel位宽->内存DIMM这之间相互的联系。

那为什么说需要字节对齐呢,如果我们不进行字节对齐,那么我们访问一个没对齐的对象是就需要两个周期,即上文所说的we may need to perform two memory accesses, since the object might be split across two 8-byte memory blocks.
如下图所示(图演示的是4 bytes aligned的情况):

unalignedAccess.jpg

图片来自文章《Data alignment: Straighten up and fly right》,文章中这样描述CPU处理未对齐时的情景:

The processor needs to read the first chunk of the unaligned address and shift out the “unwanted” bytes from the first chunk. Then it needs to read the second chunk of the unaligned address and shift out some of its information. Finally, the two are merged together for placement in the register. It’s a lot of work.

即先读取一次未对齐对象所在的第一块内存chunk,去除没用的字节,再读取一次对象所在的第二块内存chunk,去除没用的字节,最后将两者合并到一起放到寄存器中。注意!!并不是所有的CPU都支持这种操作的,Intel的x86_64是支持未做内存对齐的,但是内存未对齐依然会造成性能开销,所以即使支持Intel还是建议内存对齐(Intel recommends that data be aligned to improve memory system performance)。当然有的CPU就直接不支持未对内存对齐的操作,比如以追求快速著称的MIPS、以及部分的ARM。

由此我们看到,在x64的架构下,我们保持一个8 bytes的对齐还是比较合理的,所以Java在整体的字节对齐上选择了8字节对齐,我注意到《Java Performance》中第8.2章有这么一段描述:

It turns out that objects are already aligned on an 8-byte boundary in the JVM (in both
the 32- and 64-bit versions); this is the optimal alignment for most processors.

说是在JVM中(不管是32位的还是64位的),对象已经按8字节边界对齐了;对于大部分处理器,这种对齐方案都是最优的。
看Oracle的意思是针对当今大部分的处理器(多是64的情况)进行的最优解选择,不对32位做特殊处理(毕竟从Java9开始都不提供32位的JDK了) ::aru:smile:: 。

让我们回到# Objects are 8 bytes aligned.这句话本身,如果对JOL的源码进行查看,就会发现在输出信息的public String details() 里面有这么一段代码:

out.println("# Objects are " + objectAlignment + " bytes aligned.");

而这个objectAlignment则是VM.current().objectAlignment(),对应到Hotspot中,我们发现runtime的VM.java中的代码:

public int getObjectAlignmentInBytes() {
	if (objectAlignmentInBytes == 0) {
		Flag flag = getCommandLineFlag("ObjectAlignmentInBytes");
		objectAlignmentInBytes = (flag == null) ? 8 : (int)flag.getIntx();
	}
	return objectAlignmentInBytes;
}

当然一般JVM在初始化的时候会默认给objectAlignmentInBytes进行赋值,在globals.hpp中:

lp64_product(intx, ObjectAlignmentInBytes, 8,"Default object alignment in bytes, 8 is minimum")

PS:当然我们也可以使用-XX:ObjectAlignmentInBytes=16的启动项来代替8字节对齐(视情况而定,相当浪费空间了)。

点赞收藏
分类:标签:
豆大侠

一只菜鸡.

请先登录,查看4条精彩评论吧
快去登录吧,你将获得
  • 浏览更多精彩评论
  • 和开发者讨论交流,共同进步

为你推荐

从 Linux 内核角度探秘 JDK MappedByteBuffer

从 Linux 内核角度探秘 JDK MappedByteBuffer

MappedByteBuffer VS FileChannel:从内核层面对比两者的性能差异

MappedByteBuffer VS FileChannel:从内核层面对比两者的性能差异

8
4