本文介绍了在我的测试程序中,为什么从char缓冲区读取一个int的`std :: copy` 5x(!)比`memcpy`要慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是对我在哪里发布了该程序:
This is a follow-up to this question where I posted this program:
#include <algorithm>
#include <cstdlib>
#include <cstdio>
#include <cstring>
#include <ctime>
#include <iomanip>
#include <iostream>
#include <vector>
#include <chrono>
class Stopwatch
{
public:
typedef std::chrono::high_resolution_clock Clock;
//! Constructor starts the stopwatch
Stopwatch() : mStart(Clock::now())
{
}
//! Returns elapsed number of seconds in decimal form.
double elapsed()
{
return 1.0 * (Clock::now() - mStart).count() / Clock::period::den;
}
Clock::time_point mStart;
};
struct test_cast
{
int operator()(const char * data) const
{
return *((int*)data);
}
};
struct test_memcpy
{
int operator()(const char * data) const
{
int result;
memcpy(&result, data, sizeof(result));
return result;
}
};
struct test_memmove
{
int operator()(const char * data) const
{
int result;
memmove(&result, data, sizeof(result));
return result;
}
};
struct test_std_copy
{
int operator()(const char * data) const
{
int result;
std::copy(data, data + sizeof(int), reinterpret_cast<char *>(&result));
return result;
}
};
enum
{
iterations = 2000,
container_size = 2000
};
//! Returns a list of integers in binary form.
std::vector<char> get_binary_data()
{
std::vector<char> bytes(sizeof(int) * container_size);
for (std::vector<int>::size_type i = 0; i != bytes.size(); i += sizeof(int))
{
memcpy(&bytes[i], &i, sizeof(i));
}
return bytes;
}
template<typename Function>
unsigned benchmark(const Function & function, unsigned & counter)
{
std::vector<char> binary_data = get_binary_data();
Stopwatch sw;
for (unsigned iter = 0; iter != iterations; ++iter)
{
for (unsigned i = 0; i != binary_data.size(); i += 4)
{
const char * c = reinterpret_cast<const char*>(&binary_data[i]);
counter += function(c);
}
}
return unsigned(0.5 + 1000.0 * sw.elapsed());
}
int main()
{
srand(time(0));
unsigned counter = 0;
std::cout << "cast: " << benchmark(test_cast(), counter) << " ms" << std::endl;
std::cout << "memcpy: " << benchmark(test_memcpy(), counter) << " ms" << std::endl;
std::cout << "memmove: " << benchmark(test_memmove(), counter) << " ms" << std::endl;
std::cout << "std::copy: " << benchmark(test_std_copy(), counter) << " ms" << std::endl;
std::cout << "(counter: " << counter << ")" << std::endl << std::endl;
}
我注意到,由于某些原因,std::copy
的性能比memcpy差得多.在我的Mac上使用gcc 4.7的输出看起来像这样.
I noticed that for some reason std::copy
performs much worse than memcpy. The output looks like this on my Mac using gcc 4.7.
g++ -o test -std=c++0x -O0 -Wall -Werror -Wextra -pedantic-errors main.cpp
cast: 41 ms
memcpy: 46 ms
memmove: 53 ms
std::copy: 211 ms
(counter: 3838457856)
g++ -o test -std=c++0x -O1 -Wall -Werror -Wextra -pedantic-errors main.cpp
cast: 8 ms
memcpy: 7 ms
memmove: 8 ms
std::copy: 19 ms
(counter: 3838457856)
g++ -o test -std=c++0x -O2 -Wall -Werror -Wextra -pedantic-errors main.cpp
cast: 3 ms
memcpy: 2 ms
memmove: 3 ms
std::copy: 27 ms
(counter: 3838457856)
g++ -o test -std=c++0x -O3 -Wall -Werror -Wextra -pedantic-errors main.cpp
cast: 2 ms
memcpy: 2 ms
memmove: 3 ms
std::copy: 16 ms
(counter: 3838457856)
如您所见,即使使用-O3
,它也比memcpy慢5倍(!).
As you can see, even with -O3
it is up to 5 times (!) slower than memcpy.
结果在Linux上是相似的.
The results are similar on Linux.
有人知道为什么吗?
推荐答案
那不是我得到的结果:
> g++ -O3 XX.cpp
> ./a.out
cast: 5 ms
memcpy: 4 ms
std::copy: 3 ms
(counter: 1264720400)
Hardware: 2GHz Intel Core i7
Memory: 8G 1333 MHz DDR3
OS: Max OS X 10.7.5
Compiler: i686-apple-darwin11-llvm-g++-4.2 (GCC) 4.2.1
在Linux机器上,我得到不同的结果:
On a Linux box I get different results:
> g++ -std=c++0x -O3 XX.cpp
> ./a.out
cast: 3 ms
memcpy: 4 ms
std::copy: 21 ms
(counter: 731359744)
Hardware: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Memory: 61363780 kB
OS: Linux ip-10-58-154-83 3.2.0-29-virtual #46-Ubuntu SMP
Compiler: g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
这篇关于在我的测试程序中,为什么从char缓冲区读取一个int的`std :: copy` 5x(!)比`memcpy`要慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!