问题描述
我知道 C 中的未对齐访问是什么,它可能导致某些处理器 UB.
I know what is the unaligned access in C and that it can cause for some processors UB.
不知道这样的代码有没有同样的问题,写在NASM程序集上:
I wonder if there is the same problem in code like this, written on NASM assembly:
section .text
global _start
_start:
mov [arr], word "abcd"
section .data
arr: db 1, 2, 3, 4, 5, 6, 7
推荐答案
一般没问题,x86 允许任何大小的未对齐访问(对 16 字节未对齐有一些限制).
Generally no problem, x86 allows unaligned accesses for any size (with some limitations for 16-byte unaligned).
其他一些 ISA 没有(例如 SPARC、MIPS32r6 之前的 MIPS 等),而 C 通过不定义 T*
指针小于 alignof(T)
对齐.在 GNU C 中,您可以使用 __attribute__((aligned(1)))
来 typedef 在任何对齐方式下具有明确定义行为的类型.
Some other ISAs don't (e.g. SPARC, MIPS before MIPS32r6, etc.) and C caters to those by not defining the behaviour when a T*
pointer has less than alignof(T)
alignment. In GNU C you can use __attribute__((aligned(1)))
to typedef types that have well-defined behaviour at any alignment.
.data
部分在Linux下默认至少对齐4个字节,所以一个2字节(word
)存储到[arr]
是一个对齐的商店;地址保证是偶数(除非您使用特殊的链接器选项/链接器脚本告诉它在奇数地址上启动 .data
).您的 arr
从 .data
部分的开头开始.
The .data
section will be aligned by at least 4 bytes by default under Linux, so a 2-byte (word
) store to [arr]
is an aligned store; the address is guaranteed to be even (unless you use special linker options / linker script to tell it to start .data
on an odd address). Your arr
starts at the start of your .data
section.
此外,"abcd"
是一个 4 字节的常量,必须将其截断以适应 word
.我猜您在测试示例以查看它碰巧在您自己的计算机上运行时错过了这一点,然后再询问它总体上是否安全?
Also, "abcd"
is a 4-byte constant that will have to be truncated to fit in a word
. I guess you missed that when you tested your example to see that it happened to work on your own computer, before asking if it was safe in general?
导致某些处理器 UB
不,它总是 ISO C 中的 UB.请参阅 为什么未对齐访问 mmap'ed 内存有时在 AMD64 上出现段错误? 示例和链接.请注意,未定义行为并不意味着它确实崩溃,只是优化器可以假设它不会发生并且结果可能无法预测.
No, it's always UB in ISO C. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example and links. Note that Undefined Behaviour doesn't mean it does crash, just that the optimizer can assume it doesn't happen and the results can be unpredictable.
这种行为在 x86 中总是明确定义的,就像大多数 ISA 一样.硬件供应商必须准确指定即使在引发异常的情况下会发生什么,因此可以编写操作系统以在用户空间导致故障时保持对机器的控制.(因此,在 asm 中,您真正要寻找的不是定义行为,而是保证无故障.)
The behaviour is always well-defined in x86, like for most ISAs. Hardware vendors have to specify exactly what happens even in cases that raise exceptions, so OSes can be written to maintain control of the machine when user-space causes faults. (So in asm, what you're really looking for isn't defined-behaviour, but guaranteed non-faulting.)
任何不对齐对于 16 字节以外的任何访问大小都没有问题.(假设 AC 位被清除,这是正常系统中的情况.例如,glibc memcpy 如果你设置它会出错,对于小的未对齐副本.除非你自己专门设置 AC 来检测无意的未对齐访问,否则你可以假设它已清除.现代 CPU 上还有用于拆分加载和拆分存储的性能计数器,您可以使用它们来检测有问题的计数器.)
Any misalignment is fine for any access size other than 16 bytes. (Assuming the AC bit is cleared, which is the case in normal systems. glibc memcpy for example would fault if you set it, for small unaligned copies. Unless you specifically set AC yourself as a way to detect unintentional unaligned accesses, you can assume it's cleared. There are also performance counters for split-loads and split-stores on modern CPUs which you can use instead to detect problematic ones.)
对于 16 字节访问,legacy-SSE 访问默认需要自然对齐(例如 SSE2 pxor xmm0, [rdi]
需要对齐),除了 movdqu
未对齐的加载/存储.其他大小(如 8 字节)不需要对齐,例如punpckldq mm0, [rdi]
是对齐安全的,因为 MMX 寄存器只有 8 个字节宽,即使 punpck
指令烦人地执行全宽加载而不是它们的一半拖到目的地.)
For 16-byte accesses, legacy-SSE accesses require natural alignment by default (e.g. SSE2 pxor xmm0, [rdi]
requires alignment), except for instructions like movdqu
unaligned load/store. Other sizes like 8-byte don't require alignment, e.g. punpckldq mm0, [rdi]
is alignment-safe because MMX registers are only 8 bytes wide, even though punpck
instructions annoyingly do full-width loads instead of just the half that they shuffle in to the destination.)
使用 AVX/AVX-512 编码 (VEX/EVEX),未对齐是默认值(例如 vaddps xmm0, xmm1, [rdi]
不需要对齐),并且只有像 vmovntps
-stores 或 vmovdqa
加载/存储这样的特殊需要对齐的指令会在未对齐时出错.
With AVX / AVX-512 encodings (VEX / EVEX), unaligned is the default (e.g. vaddps xmm0, xmm1, [rdi]
doesn't require alignment), and only special alignment-required instructions like vmovntps
-stores or vmovdqa
load/store will fault on misalignment.
即使对于未对齐的地址,需要对齐的访问行为也是明确定义的:SSE/AVX 未对齐的#GP 错误,或者如果您设置了 AC 位并执行了一些需要 2 的操作,则为 #AC、4 或 8 个字节的对齐方式,但不符合该要求.(https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce-231.html 摘录了英特尔 SDM PDF 的相关页面.)
The behaviour of alignment-required accesses is well-defined even for misaligned addresses: #GP fault for SSE/AVX misalignment, or #AC if you set the AC bit and did something that required 2, 4, or 8 bytes of alignment but didn't meet that requirement. (https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce-231.html excerpts the relevant page of Intel's SDM PDFs.)
在 GNU/Linux 下,如果用户空间进程生成 #GF 异常,它将收到 SIGSEGV(分段错误).IIRC,#AC 可能会让内核传递 SIGBUS(总线错误).
Under GNU/Linux, a user-space process will receive a SIGSEGV (segmentation fault) if it generates a #GF exception. IIRC, #AC might get the kernel to deliver a SIGBUS (bus error).
(除了旧版 SSE 内存操作数中提到的.)
(Except as mentioned with legacy-SSE memory operands.)
- 成功未对齐访问的实际影响是什么x86?
- 如何准确地对未对齐访问进行基准测试x86_64 上的速度 - 我的回答涵盖了您在设计基准测试时想要测量的一些影响.
- http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/ - 存储转发依赖于某些 CPU 的对齐,尤其是较旧的 CPU.
- What's the actual effect of successful unaligned accesses on x86?
- How can I accurately benchmark unaligned access speed on x86_64 - my answer covers some of the effects that you'd want to try to measure if designing a benchmark.
- http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/ - Store-forwarding depends on alignment on some CPUs, especially older ones.
这篇关于NASM 中是否存在未对齐的访问问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!