问题描述
我最近遇到了这个精彩的cpp2015演讲
为防止编译器优化代码,提到的技术之一是使用下面的函数。
static void escape(void * p){
asm volatile(::g(p) :记忆);
static void clobber(){
asm volatile(:::memory);
}
void基准()
{
vector< int> v;
v.reserve(1);
escape(v.data());
v.push_back(10);
clobber()
}
我试图理解这一点。问题如下。
1)逃生比clobber有什么优势?
2)从上面的例子看起来像clobber()阻止了前面的语句(push_back)被优化的方式。
void benchmark()
{
vector<如果是这种情况,为什么下面的代码片段不正确? ; INT> v;
v.reserve(1);
v.push_back(10);
clobber()
}
如果这不够混乱, (FB的线程库)有一个甚至
相关片段:
模板< class T>
void doNotOptimizeAway(T& datum){
asm volatile(:+ r(datum));
$ / code>
我的理解是,上面的代码片段告诉编译器汇编块将写入数据。但是如果编译器发现没有这个数据的消费者,它仍然可以优化出产生数据的实体吗?
我认为这不是常识, !
解决方案
tl; dr
doNotOptimizeAway
。
这里有一点术语:def(definition)是一个语句,它为变量赋值;一个使用是一个语句,它使用变量的值来执行某些操作。
如果从紧跟def后的点开始,程序的所有路径exit不会遇到变量的使用,def被称为 dead
和Dead Code Elimination(DCE)通行证将删除它。这反过来可能导致其他defs死亡(如果这个def由于有可变操作数而被使用)等。
想象一下在Scalar Replacement of Aggregate之后的程序(SRA)传递,它将两个变量 len
和中的本地 std :: vector
PTR
。在某些时候,程序为 ptr
赋值;该语句是一个def。
现在,原始程序没有对矢量做任何事情;换句话说, c $ c> len
或 ptr
都没有任何用法。因此,他们的所有defs都已经死了,DCE可以删除它们,从而有效地删除所有代码并使基准毫无价值。 )创建了一个人为的使用,它可以防止DCE删除defs。 (作为一个方面说明,我在+中看不到任何一点,g应该已经足够)。
类似的推理可以跟随记忆加载和存储:如果程序结束时没有路径,则存储(def)将死亡,该路径包含该存储位置的加载(使用)。由于跟踪任意内存位置比跟踪单个伪寄存器变量困难得多,因此编译器会保守地推断 - 如果程序结束时没有路径,则可能会遇到使用该存储。
一个这样的情况是存储到一个内存区域,该区域保证不会被别名 - 在释放内存之后,可能是该商店的用途,它不会触发未定义的行为。因此,编译器可以消除 v.push_back(42)
。但是 escape
- 它会导致 v.data()
被视为任意别名,正如@Leon描述的在这个例子中, clobber()
的用途是创建所有别名的人为使用记忆。我们有一个商店(来自 push_back(42)
),商店位于全局别名的位置(由于转义(v.data ())
),因此 clobber()
可能包含该存储的使用(IOW,商店副作用是可观察的),因此编译器不允许删除商店。
一些简单的例子:
示例I: p>
void f(){
int v [1];
v [0] = 42;
}
这不会产生任何代码。
示例二:
extern void g();
void f(){
int v [1];
v [0] = 42;
g();
$ / code>
这会产生对
g()
,没有内存存储。由于 v
不是 v
,所以 g
不能访问 v
例三:
void clobber(){
__asm__ __volatile__(:::memory);
}
void f(){
int v [1];
v [0] = 42;
clobber();
$ b $ p
$ b 像前面的例子一样,没有生成商店,因为 v
不是别名,并且对 clobber
的调用内置为空。 IV:
模板< typename T>
void use(T& t){
__asm__ __volatile__(::g(t));
}
void f(){
int v [1];
使用(v);
v [0] = 42;
这次 v
逃逸(即可以从其他激活帧潜在访问)。但是,该商店仍然被删除,因为之后它没有潜在的内存使用(没有UB)。
示例V:
template< typename T>
void use(T& t){
__asm__ __volatile__(::g(t));
}
extern void g();
void f(){
int v [1];
使用(v);
v [0] = 42;
g(); //同clobber()
}
最后我们得到商店,因为 v
转义,并且编译器必须保守地假定对 g
的调用可以访问存储的值。
(实验 )
I recently came across this brilliant cpp2015 talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"
One of the techniques mentioned to prevent the compiler from optimizing code is using the below functions.
static void escape(void *p) {
asm volatile("" : : "g"(p) : "memory");
}
static void clobber() {
asm volatile("" : : : "memory");
}
void benchmark()
{
vector<int> v;
v.reserve(1);
escape(v.data());
v.push_back(10);
clobber()
}
I'm trying to understand this. Questions as follows.
1) What is the advantage of an escape over clobber ?
2) From the example above it looks like clobber() prevents the previous statement ( push_back ) to be optimized way. If that's the case why the below snippet is not correct ?
void benchmark()
{
vector<int> v;
v.reserve(1);
v.push_back(10);
clobber()
}
If this wasn't confusing enough, folly ( FB's threading lib ) has an even stranger implementation
Relevant snippet:
template <class T>
void doNotOptimizeAway(T&& datum) {
asm volatile("" : "+r" (datum));
}
My understanding is that the above snippet informs the compiler that the assembly block will writes to datum. But if the compiler finds there is no consumer of this datum it can still optimize out the entity producing datum right ?
I assume this is not common knowledge and any help is appreciated !
解决方案 tl;dr doNotOptimizeAway
creates an artificial "use"s.
A little bit of terminology here: a "def" ("definition") is a statement, which assigns a value to a variable; a "use" is a statement, which uses the value of a variable to perform some operation.
If from the point immediately after a def, all the paths to the program exit do not encounter a use of a variable, that def is called dead
and Dead Code Elimination (DCE) pass will remove it. Which in turn may cause other defs to become dead (if that def was an use by virtue of having variable operands), etc.
Imagine the program after Scalar Replacement of Aggregates (SRA) pass, which turns the local std::vector
in two variables len
and ptr
. At some point the program assigns a value to ptr
; that statement is a def.
Now, the original program didn't do anything with the vector; in other words there weren't any uses of either len
or ptr
. Hence, all of their defs are dead and the DCE can remove them, effectively removing all code and making the benchmark worthless.
Adding doNotOptimizeAway(ptr)
creates an artificial use, which prevents DCE from removing the defs. (As a side note, I see no point in the "+", "g" should have been enough).
A similar line of reasoning can be followed with memory loads and stores: a store (a def) is dead iff there is no path to the end of the program, which contains load (a use) from that store location. As tracking arbitrary memory locations is a lot harder than tracking individual pseudo-register variables, the compiler reasons conservatively - a store is dead if there is no path to the end of the program, which could possibly encounter a use of that store.
One such case, is a store to a region of memory, which is guaranteed to not be aliased - after that memory is deallocated, there could not possibly be a use of that store, which does not trigger undefined behaviour. IOW, there are no such uses.
Thus a compiler could eliminate v.push_back(42)
. But there comes escape
- it causes the v.data()
to be considered as arbitrarily aliased, as @Leon described above.
The purpose of clobber()
in the example is to create an artificial use of all of the aliased memory. We have a store (from push_back(42)
), the store is to a location that is globally aliased (due to the escape(v.data())
), hence clobber()
could potentially contain a use of that store (IOW, the store side effect to be observable), therefore the compiler is not allowed to remove the store.
A few simpler examples:
Example I:
void f() {
int v[1];
v[0] = 42;
}
This does not generate any code.
Example II:
extern void g();
void f() {
int v[1];
v[0] = 42;
g();
}
This generates just a call to g()
, no memory store. The function g
cannot possibly access v
because v
is not aliased.
Example III:
void clobber() {
__asm__ __volatile__ ("" : : : "memory");
}
void f() {
int v[1];
v[0] = 42;
clobber();
}
Like in the previous example, no store generated because v
is not aliased and the call to clobber
is inlined to nothing.
Example IV:
template<typename T>
void use(T &&t) {
__asm__ __volatile__ ("" :: "g" (t));
}
void f() {
int v[1];
use(v);
v[0] = 42;
}
This time v
escapes (i.e. can be potentially accessed from other activation frames). However, the store is still removed, since after it there were no potential uses of that memory (without UB).
Example V:
template<typename T>
void use(T &&t) {
__asm__ __volatile__ ("" :: "g" (t));
}
extern void g();
void f() {
int v[1];
use(v);
v[0] = 42;
g(); // same with clobber()
}
And finally we get the store, because v
escapes and the compiler must conservatively assume that the call to g
may access the stored value.
(for experiments https://godbolt.org/g/rFviMI)
这篇关于基准测试时防止编译器优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!