问题描述
我正在开发一个软实时事件处理系统。我想在我的代码中尽可能多的调用具有非确定性的时间。我需要构造一个由字符串,数字,时间戳和GUID组成的消息。可能是 boost :: variant
的 std :: vector
我一直想在过去类似的代码中使用 alloca
。然而,当我们看到系统编程文献中总是有大量的警告反对这个函数调用。我个人不能想到在过去15年中没有虚拟内存的服务器类机器,我知道一个事实,Windows堆栈增长一个虚拟内存页面一次,所以我假设Unices也一样。在这里没有砖墙(不再),堆栈就像堆可能会耗尽空间,所以什么给了?为什么人们不会在aloca gaga?我可以想到许多用例的负责任的使用alloca(字符串处理任何人。)。
无论如何,我决定测试性能差异是alloca和malloc之间的5倍速度差异(测试捕获我将如何使用alloca)。所以,事情改变了?我们应该小心谨慎,使用 alloca
(包裹在 std :: allocator
)绝对肯定我们的对象的一生?
我厌倦了生活在恐惧中!
编辑:
好,所以有限制,对于Windows链接时间限制。对于Unix它似乎是可调的。看来一个页面对齐的内存分配器是顺序:D任何人都知道通用的便携式实现:D?
代码:
#include < stdlib.h>
#include< time.h>
#include< boost / date_time / posix_time / posix_time.hpp>
#include< iostream>
使用命名空间boost :: posix_time;
int random_string_size()
{
return((rand()%1023)+1);
}
int random_vector_size()
{
return((rand()%31)+1);
}
void alloca_test()
{
int vec_sz = random_vector_size();
void ** vec =(void **)alloca(vec_sz * sizeof(void *));
for(int i = 0; i {
vec [i] = alloca(random_string_size());
}
}
void malloc_test()
{
int vec_sz = random_vector_size();
void ** vec =(void **)malloc(vec_sz * sizeof(void *));
for(int i = 0; i {
vec [i] = malloc(random_string_size());
}
for(int i = 0; i {
free(vec [i]);
}
free(vec);
}
int main()
{
srand(time(NULL));
ptime now;
ptime after;
int test_repeat = 100;
int times = 100000;
time_duration alloc_total;
for(int ii = 0; ii {
now = microsec_clock :: local_time();
for(int i = 0; i {
alloca_test();
}
after = microsec_clock :: local_time();
alloc_total + = after -now;
}
std :: cout<< alloca_time:< alloc_total / test_repeat<< std :: endl;
time_duration malloc_total
for(int ii = 0; ii {
now = microsec_clock :: local_time();
for(int i = 0; i< times; ++ i)
{
malloc_test();
}
after = microsec_clock :: local_time();
malloc_total + = after-now;
}
std :: cout<< malloc_time:< malloc_total / test_repeat<< std :: endl;
}
输出:
hassan @ hassan-desktop:〜/ test $ ./a.out
alloca_time:00:00:00.056302
malloc_time:00 :00:00.260059
hassan @ hassan-desktop:〜/ test $ ./a.out
alloca_time:00:00:00.056229
malloc_time:00:00:00.256374
hassan @ hassan-desktop:〜/ test $ ./a.out
alloca_time:00:00:00.056119
malloc_time:00:00:00.265731
- 编辑:home machine,clang和google perftools上的结果 -
没有任何优化标志的G ++
alloca_time:00:00:00.025785
malloc_time:00:00:00.106345
$ b b G ++ -O3
alloca_time:00:00:00.021838
cmalloc_time:00:00:00.111039
Clang没有标志
alloca_time:00:00 :00.025503
malloc_time:00:00:00.104551
Clang -O3(alloca变得神奇地快)
alloca_time:00:00:00.013028
malloc_time:00:00 :00.101729
g ++ -O3 perftools
alloca_time:00:00:00.021137
malloc_time:00:00:00.043913
clang ++ -O3 perftools甜点)
alloca_time:00:00:00.013969
malloc_time:00:00:00.044468
如果你只要分配几百/千字节,前进。除了这些之外,还要取决于任何给定系统上的限制(ulimit),这只是一个灾难的秘诀。
在我的开发盒在工作(Gentoo)我有一个默认堆栈大小限制为8192 kb。这不是很大,如果alloca溢出堆栈,那么行为是未定义的。
I am working on a soft-realtime event processing system. I would like to minimise as many calls in my code that have non-deterministic timing. I need to construct a message that consists of strings, numbers, timestamps and GUID's. Probably a std::vector
of boost::variant
's.
I have always wanted to use alloca
in past code of a similar nature. However, when one looks into systems programming literature there are always massive cautions against this function call. Personally I can't think of a server class machine in the last 15 years that doesn't have virtual memory, and I know for a fact that the windows stack grows a virtual-memory page-at-a-time, so I assume Unices do as well. There is no brick wall here (anymore), the stack is just as likely to run out of space as the heap, so what gives ? Why aren't people going gaga over aloca ? I can think of many use-cases of responsible use of alloca (string processing anyone ?).
Anyhow, I decided to test the performance difference (see below) and there is a 5-fold speed difference between alloca and malloc (the test captures how I would use alloca). So, have things changed? Should we just throw caution to the wind and use alloca
(wrapped in a std::allocator
) whenever we can be absolutely certain of the lifetime of our objects ?
I am tired of living in fear !
Edit:
Ok so there are limits, for windows it is a link-time limit. For Unix it seems to be tunable. It seems a page-aligned memory allocator is in order :D Anyone know of a general purpose portable implementation :D ?
Code:
#include <stdlib.h>
#include <time.h>
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
using namespace boost::posix_time;
int random_string_size()
{
return ( (rand() % 1023) +1 );
}
int random_vector_size()
{
return ( (rand() % 31) +1);
}
void alloca_test()
{
int vec_sz = random_vector_size();
void ** vec = (void **) alloca(vec_sz * sizeof(void *));
for(int i = 0 ; i < vec_sz ; i++)
{
vec[i] = alloca(random_string_size());
}
}
void malloc_test()
{
int vec_sz = random_vector_size();
void ** vec = (void **) malloc(vec_sz * sizeof(void *));
for(int i = 0 ; i < vec_sz ; i++)
{
vec[i] = malloc(random_string_size());
}
for(int i = 0 ; i < vec_sz ; i++)
{
free(vec[i]);
}
free(vec);
}
int main()
{
srand( time(NULL) );
ptime now;
ptime after;
int test_repeat = 100;
int times = 100000;
time_duration alloc_total;
for(int ii=0; ii < test_repeat; ++ii)
{
now = microsec_clock::local_time();
for(int i =0 ; i < times ; ++i)
{
alloca_test();
}
after = microsec_clock::local_time();
alloc_total += after -now;
}
std::cout << "alloca_time: " << alloc_total/test_repeat << std::endl;
time_duration malloc_total;
for(int ii=0; ii < test_repeat; ++ii)
{
now = microsec_clock::local_time();
for(int i =0 ; i < times ; ++i)
{
malloc_test();
}
after = microsec_clock::local_time();
malloc_total += after-now;
}
std::cout << "malloc_time: " << malloc_total/test_repeat << std::endl;
}
output:
hassan@hassan-desktop:~/test$ ./a.out
alloca_time: 00:00:00.056302
malloc_time: 00:00:00.260059
hassan@hassan-desktop:~/test$ ./a.out
alloca_time: 00:00:00.056229
malloc_time: 00:00:00.256374
hassan@hassan-desktop:~/test$ ./a.out
alloca_time: 00:00:00.056119
malloc_time: 00:00:00.265731
--Edit: Results on home machine, clang, and google perftools--
G++ without any optimization flags
alloca_time: 00:00:00.025785
malloc_time: 00:00:00.106345
G++ -O3
alloca_time: 00:00:00.021838
cmalloc_time: 00:00:00.111039
Clang no flags
alloca_time: 00:00:00.025503
malloc_time: 00:00:00.104551
Clang -O3 (alloca become magically faster)
alloca_time: 00:00:00.013028
malloc_time: 00:00:00.101729
g++ -O3 perftools
alloca_time: 00:00:00.021137
malloc_time: 00:00:00.043913
clang++ -O3 perftools (The sweet spot)
alloca_time: 00:00:00.013969
malloc_time: 00:00:00.044468
Well first of all, even though there is a lot of virtual memory doesn't mean your process will be allowed to fill it. On *nix there are stack size limits, whereas the heap is a lot more forgiving.
If you're only going to be allocating a few hundred / thousand bytes, sure go ahead. Anything beyond that is going to depend on what limits (ulimit) are in place on any given system, and that's just a recipe for disaster.
Why is alloca not considered good practice?
On my development box at work (Gentoo) I have a default stack size limit of 8192 kb. That's not very big, and if alloca overflows the stack then the behavior is undefined.
这篇关于关于使用和滥用alloca的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!