昨天看Intel的一篇程序开发优化手册,多次提到16字节对齐,这是其中一段
...
Assembly/Compiler Coding Rule 45. (H impact, H generality) Align data on
natural operand size address boundaries. If the data will be accessed with vector
instruction loads and stores, align the data on 16-byte boundaries.
...
文中还多次强调要时刻记住cache line的大小是64字节(当前主流Intel x86/x64),尽量不要跨64字节边界访问以提高CACHE的效率
....
Misaligned data access can incur significant performance penalties. This is particu-
larly true for cache line splits. The size of a cache line is 64 bytes in the Pentium 4 and
other recent Intel processors, including processors based on Intel Core microarchi-
tecture.
....
On Intel Core 2 Duo, Intel Core Duo, Intel Core Solo, Pentium 4, Intel Xeon and
Pentium M processors, memory coherence is maintained on 64-byte cache lines
(rather than 32-byte cache lines. as in earlier processors). This can increase the
opportunity for false sharing.