问题描述
我已经阅读了很多有关未定义行为(UB)的文章,但是所有人都在谈论理论.我想知道在实践中会发生什么,因为包含UB的程序实际上可以运行.
I've read a lot of articles talking about undefined behavior (UB), but all do talk about theory. I am wondering what could happen in practice, because the programs containing UB may actually run.
我的问题与类似Unix的系统有关,而不与嵌入式系统有关.
My questions relates to unix-like systems, not embedded systems.
我知道,不应编写依赖未定义行为的代码.请不要发送这样的答案:
I know that one should not write code that relies on undefined behavior. Please do not send answers like this:
- 一切皆有可能
- 守护进程可以从你的鼻子里飞出来
- 计算机可能会跳起来并起火
尤其是对于第一个,这是不正确的.您显然无法通过执行有符号整数溢出来获得root权限.我问这仅是出于教育目的.
Especially for the first one, it is not true. You obviously cannot get root by doing a signed integer overflow. I'm asking this for educational purpose only.
implementation
是编译器吗?
*"abc" = '\0';
除了发生段错误之外,是否需要破坏系统?即使无法预测,实际会发生什么?可以将第一个字节设置为零吗?还有什么?如何?
For something else than a segfault to happen, do I need my system to be broken? What could actually happen even if it is not predictable? Could the first byte be set to zero ? What else, and how?
int i = 0;
foo(i++, i++, i++);
这是UB,因为未定义参数的评估顺序.正确的.但是,在程序运行时,谁来决定按什么顺序评估参数:是编译器,操作系统还是其他工具?
This is UB because the order in which parameters are evaluated is undefined. Right. But, when the program runs, who decides in what order the parameters are evaluated: is is the compiler, the OS, or something else?
$ cat test.c
int main (void)
{
printf ("%d\n", (INT_MAX+1) < 0);
return 0;
}
$ cc test.c -o test
$ ./test
Formatting root partition, chomp chomp
根据其他SO用户,这是可能的.这怎么可能呢?我需要一个坏掉的编译器吗?
According to other SO users, this is possible. How could this happen? Do I need a broken compiler?
使用与上面相同的代码.除了表达式(INT_MAX+1)
产生随机值之外,实际上会发生什么?
Use the same code as above. What could actually happen, except of the expression (INT_MAX+1)
yielding a random value ?
GCC是否 -fwrapv
选项定义有符号整数溢出的行为,还是仅使GCC假定它会回绕,但实际上在运行时无法回绕?
Does the GCC -fwrapv
option defines the behavior of a signed integer overflow, or does it only make GCC assume that it will wrap around but it could in fact not wrap around at runtime?
这涉及嵌入式系统.当然,如果PC跳到了意外的地方,则两个输出可能会连接在一起并造成短路(例如).
This one concerns embedded systems. Of course, if the PC jumps to an unexpected place, two outputs could be wired together and create a short-circuit (for example).
但是,当执行与此类似的代码时:
But, when executing code similar to this:
*"abc" = '\0';
PC不会被引导到通用异常处理程序吗?还是我想念什么?
Wouldn't the PC be vectored to the general exception handler? Or what am I missing?
推荐答案
在实践中,大多数编译器通过以下两种方式使用未定义的行为:
In practice, most compilers use undefined behavior in either of the following ways:
- 在编译时打印警告,以告知用户他可能犯了一个错误
- 推断变量值的属性,并使用它们简化代码
- 执行不安全的优化,只要它们只会破坏未定义行为的预期语义
编译器通常不被设计为恶意的.利用未定义行为的主要原因通常是从中获得一些性能优势.但是有时候这可能涉及彻底消除死代码.
Compilers are usually not designed to be malicious. The main reason to exploit undefined behavior is usually to get some performance benefit from it. But sometimes that can involve total dead code elimination.
A)是.编译器应记录他选择的行为.但是通常很难预测或解释UB的后果.
A) Yes. The compiler should document what behavior he chose. But usually that is hard to predict or explain the consequences of UB.
B)如果该字符串实际上已在内存中实例化并且在可写页面中(默认情况下将在只读页面中),则其第一个字符可能会变为空字符.整个表达式很可能会被当作死代码丢弃,因为它是一个临时值,会从表达式中消失.
B) If the string is actually instantiated in memory and is in a writable page (by default it will be in a read-only page), then its first character might become a null character. Most probably, the entire expression will be thrown out as dead-code because it is a temporary value that disappears out of the expression.
C)通常,评估顺序由编译器决定.在这里,它可能决定将其转换为i += 3
(如果它很傻,则可以转换为i = undef
). CPU可以在运行时对指令进行重新排序,但是如果它破坏了指令集的语义(编译器通常无法进一步降低C语义),则可以保留编译器选择的顺序.寄存器的增量不能与同一寄存器的其他增量进行转换或并行执行.
C) Usually, the order of evaluation is decided by the compiler. Here it might decide to transform it into a i += 3
(or a i = undef
if it is being silly). The CPU could reorder instructions at run-time but preserve the order chosen by the compiler if it breaks the semantic of its instruction set (the compiler usually cannot forward the C semantic further down). An incrementation of a register cannot commute or be executed in parallel to an other incrementation of that same register.
D)您需要一个愚蠢的编译器,当它检测到未定义的行为时,它会打印格式化根分区,chomp chomp".很有可能,它将在编译时打印警告,用他选择的常量替换表达式,并生成一个二进制文件,只需使用该常量执行打印即可.
D) You need a silly compiler that print "Formatting root partition, chomp chomp" when it detects undefined behavior. Most probably, it will print a warning at compile time, replace the expression by a constant of his choice and produce a binary that simply perform the print with that constant.
E)这是一个语法正确的程序,因此编译器肯定会生成一个有效的"二进制文件.从理论上讲,该二进制文件可以具有与您可以从Internet下载并运行的任何二进制文件相同的行为.您很可能会得到一个二进制文件,该二进制文件可以立即退出,或者打印上述消息并立即退出.
E) It is a syntactically correct program, so the compiler will certainly produce a "working" binary. That binary could in theory have the same behavior as any binary you could download on the internet and that you run. Most probably, you get a binary that exit straight away, or that print the aforementioned message and exit straight away.
F),它告诉GCC假设带符号的整数使用2的补码语义在C语义中环绕.因此,它必须产生一个在运行时回绕的二进制文件.这很容易,因为大多数架构仍然具有这种语义. C之所以拥有UB是为了让编译器可以假设a + 1 > a
,这对于证明循环终止和/或预测分支至关重要.因此,即使将符号整数映射到硬件中完全相同的指令,也可以使用有符号整数作为循环归纳变量来提高代码速度.
F) It tells GCC to assume the signed integers wrap around in the C semantic using 2's complement semantic. It must therefore produce a binary that wrap around at run-time. That is rather easy because most architecture have that semantic anyway. The reason for C to have that an UB is so that compilers can assume a + 1 > a
which is critical to prove that loops terminate and/or predict branches. That's why using signed integer as loop induction variable can lead to faster code, even though it is mapped to the exact same instructions in hardware.
G)未定义行为是未定义行为.生成的二进制文件确实可以执行任何指令,包括跳转到未指定的位置……或干净地触发中断.您的编译器很可能会摆脱不必要的操作.
G) Undefined behavior is undefined behavior. The produced binary could indeed run any instructions, including a jump to an unspecified place... or cleanly trigger an interruption. Most probably, your compiler will get rid of that unnecessary operation.
这篇关于实际上在C中未定义的行为上会发生什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!