问题描述
我开始比较:
- 在列表前面插入
- 在向量后面插入
- 在双端队列的前面插入
但后来我注意到即使在 push_back()
上,双端队列似乎也更快.我一定做错了某些事情,我不敢相信更通用的容器会胜过特定的容器.
But then I noticed that even on push_back()
the deque seemed to be faster. I must be doing something wrong, I can't believe a more general container would outperform a particular one.
我使用谷歌基准测试的代码:
My code using google benchmark:
#include "benchmark/benchmark.h"
#include <deque>
#include <vector>
#define NUM_INS 1000
static void BM_InsertVector(benchmark::State& state) {
std::vector<int> v;
v.reserve(NUM_INS);
while (state.KeepRunning()) {
state.PauseTiming();
v.clear();
state.ResumeTiming();
for (size_t i = 0; i < NUM_INS; i++)
v.push_back(i);
}
}
BENCHMARK(BM_InsertVector);
static void BM_InsertDeque(benchmark::State& state) {
std::deque<int> v;
while (state.KeepRunning()) {
state.PauseTiming();
v.clear();
state.ResumeTiming();
for (size_t i = 0; i < NUM_INS; i++)
v.push_back(i);
}
}
BENCHMARK(BM_InsertDeque);
BENCHMARK_MAIN();
结果:
Run on (1 X 2592 MHz CPU )
2016-02-18 14:03:47
Benchmark Time(ns) CPU(ns) Iterations
------------------------------------------------
BM_InsertVector 2820 2470 312500
BM_InsertDeque 1872 1563 406977
我注意到在处理元素数量时一些差异,但 deque 总是优于 vector.
I notice some differences when playing with the number of elements, but deque always outperforms vector.
编译器:gcc version 5.2.1
编译:g++ -O3 -std=c++11 push_front.cpp -lbenchmark -lpthread
我认为 -O3
实际上是有用的;当我关闭它时,我会得到稍微更糟的双端队列性能.
I think the -O3
is actually instrumental; when I turn it off I get a slightly worse deque performance.
推荐答案
持续向动态容器添加元素基本上涉及 3 个成本来源:
There are basically 3 sources of cost involved in continuously appending elements to a dynamic container:
- 内存管理.
- 容器的内部簿记.
- 需要对元素本身执行的任何操作.尤其;任何在插入时使引用无效的容器都可能会四处移动/复制元素.
让我们从 1 开始.vector
不断要求双倍内存,deque
分配固定大小的块(deque
通常实现为数组数组,较低层数组的大小固定).请求更多内存可能比请求更少需要更长的时间,但通常除非您的堆非常碎片化,否则一次请求一个大块是获得一些内存的最快方法.分配 1 兆字节一次,然后请求千字节 1000 次可能会更快.所以很明显 vector
最终会在这里占据优势(直到容器太大而受到碎片的影响).然而,这不是最终的:您只要求 1000 个元素.我写了以下代码 http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0一>.这不是很有趣,但它基本上使用了一个简单的分配器,它增加了一个全局变量来查看执行了多少分配.
Let's start with 1. vector
keeps asking for double the memory, and deque
allocates fixed sized chunks (deque
is typically implemented as an array of arrays, with the lower tier arrays being of fixed size). Asking for more memory may take longer than asking for less, but typically unless your heap is very fragmented asking for a big chunk all at once is the fastest way to get some memory. It's probably faster to allocate one meg once, then ask for a kilobyte 1000 times. So it seems clear that vector
will eventually have the advantage here (until the container is so large it's affected by fragmentation). However, this isn't eventually: you asked for only 1000 elements. I wrote the following code http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0. It's not very interesting but it basically uses a trivial allocator that increments a global to see how many allocations are performed.
在您的基准测试过程中,vector
请求内存 11 次,而 deque
仅 10 次.deque
一直请求相同的数量, vector
要求加倍.同样,vector
必须调用 free
10 次.而 deque
0.对于 deque
来说,这似乎是一个小小的胜利.
In the course of your benchmark, vector
asks for memory 11 times, and deque
only 10. deque
keeps asking for the same amount, vector
asks for doubling amounts. As well, vector
must be calling free
10 times. And deque
0. This seems like a small win for deque
.
对于内部簿记,vector
的实现比 deque
更简单.毕竟,vector
只是一个动态数组,而 deque
是一个数组数组,严格来说更复杂.所以这显然是 vector
的胜利.
For internal bookkeeping, vector
has a simpler implementation than deque
. After all, vector
is just a dynamic array, and deque
is an array of arrays and is strictly more complex. So this is clearly a win for vector
.
最后,关于操作本身的元素.在 deque
中,没有任何东西被移动.使用 vector
,每个新的堆分配也涉及移动所有元素.它可能已优化为将 memcpy
用于琐碎类型,但即使看到,也需要 10 次调用 memcpy
来复制 1, 2, 4, 8 ... 512 个整数.这显然是 deque
的胜利.
Finally, elements on the operations themselves. In deque
, nothing is ever moved. With vector
, every new heap allocation also involves moving all the elements. It's probably optimized to use memcpy
for trivial types, but even see, that's 10 calls to memcpy
to copy 1, 2, 4, 8 ... 512 integers. This is clearly a win for deque
.
我可以推测,启动到 O3
允许非常积极地内联 deque
中许多更复杂的代码路径,从而减少 2 的权重.但显然,除非你做了一个更详细(非常小心!)的基准测试,你永远不会确定.
I can speculate that cranking up to O3
allowed very aggressive inlining of a lot of the more complex codepaths in deque
, reducing the weight of 2. But obviously, unless you do a much more detailed (very careful!) benchmark, you'll never know for sure.
这篇文章主要是为了表明它比简单的专用容器与更通用的容器更复杂.不过我会做一个预测(把我的脖子伸出来,就像被切断一样):如果你增加元素的数量,甚至说 2 或 4 倍,你将看不到 deque
再赢.这是因为 deque
会分配 2 倍或 4 倍的堆分配,而 vector 只会多分配 1-2 倍.
Mostly, this post is to show that it's more complex than simply a specialized container vs a more general one. I will make a prediction though (put my neck out to be cut off, as it were): if you increase the number of elements by even say a factor of 2 or 4, you will not see deque
win anymore. That's because deque
will make 2x or 4x as many heap allocations, but vector will only make 1-2 more.
我也可以在这里注意到 deque
实际上是一种奇怪的数据结构;它理论上是一个数组数组,但在许多实现中,数组要么是特定大小,要么只是一个元素,以较大者为准.此外,它的一些大 O 保证是无稽之谈.push_back
只是固定的常数时间,因为在 C++ 中,只有对元素本身的操作才算入大 O.否则应该清楚,因为它是一个数组数组,顶级数组将是大小与已存储元素的数量成正比.最终,顶层数组空间不足,您必须重新分配它,移动 O(N) 个指针.所以它不是真正的 O(1) push_back
.
I may as well note here that deque
is actually kind of an odd data structure; it's theoretically an array of arrays but in many implementations the array is either a certain size, or just one element, whichever is larger. Also, some of it's big O guarantees are nonsense. push_back
is only fixed constant time, because in C++, only operations on the elements themselves count towards the big O. Otherwise it should be clear, that since it's an array of arrays, the top level array will be proportional in size to the number of elements already stored. And eventually that top level array runs out of room, and you have to reallocate it, moving O(N) pointers. So it's not really O(1) push_back
.
这篇关于std::deque 在最后插入是否比 std::vector 快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!