问题描述
我试图在C ++中并行运行一个长时间运行的函数,并使用std :: async它只使用一个核心。
I'm trying to parallelise a long running function in C++ and using std::async it only uses one core.
这不是函数的运行时间因为我目前使用的测试数据需要大约10分钟才能运行。
It's not the running time of the function is too small, as I'm currently using test data that takes about 10 mins to run.
根据我的逻辑,我创建了NThreads的Futures(每个都占一定比例循环而不是单个单元,所以它是一个很长的运行线程),每个都将分派一个异步任务。然后在他们创建后,程序自旋锁等待他们完成。但是它总是使用一个核心?!
From my logic I create NThreads worth of Futures (each taking a proportion of the loop rather than an individual cell so it is a nicely long running thread), each of which will dispatch an async task. Then after they've been created the program spin locks waiting for them to complete. However it always uses one core?!
这不是我看在顶部,说它看起来大致像一个CPU,我的ZSH配置输出CPU%最后一个命令,并且总是完全 100%,从不在上
This isn't me looking at top either and saying it looks roughly like one CPU, my ZSH config outputs the CPU % of the last command, and it always exactly 100%, never above
auto NThreads = 12;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));
std::vector<std::future<std::vector<unsigned __int128>>> Futures;
for (auto I = 0; I < NThreads; ++I) {
std::cout << "HERE" << std::endl;
unsigned __int128 Min = I * BlockSize;
unsigned __int128 Max = I * BlockSize + BlockSize;
if (I == NThreads - 1)
Max = PathCountLength;
Futures.push_back(std::async(
[](unsigned __int128 WMin, unsigned __int128 Min, unsigned__int128 Max,
std::vector<unsigned __int128> ZeroChildren,
std::vector<unsigned __int128> OneChildren,
unsigned __int128 PathCountLength)
-> std::vector<unsigned __int128> {
std::vector<unsigned __int128> LocalCount;
for (unsigned __int128 I = Min; I < Max; ++I)
LocalCount.push_back(KneeParallel::pathCountOrStatic(
WMin, I, ZeroChildren, OneChildren, PathCountLength));
return LocalCount;
},
WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength));
}
for (auto &Future : Futures) {
Future.get();
}
有没有洞察力。
我在Arch Linux上使用clang和LLVM进行编译。有没有我需要的编译标志,但从我可以告诉C ++ 11标准化的线程库?
I'm compiling with clang and LLVM on Arch Linux. Are there any compile flags I need, but from what I can tell C++11 standardised the thread library?
编辑:如果它有助于任何人提供任何进一步的线索,我注释掉它运行在所有核心上的本地向量,因为它应该,当我把它回滚回到一个核心。
If it helps anyone giving any further clues, when I comment out the local vector it runs on all cores as it should, when I drop it back in rolls back to one core.
编辑2:所以我固定了解决方案,但它似乎很离奇。从lambda函数返回向量将它固定到一个核心,所以现在我通过传递一个 shared_ptr
到输出向量并处理它。 &hen presto,它激发在核心!
Edit 2: So I pinned down the solution, but it seems very bizarre. Returning the vector from the lambda function fixed it to one core, so now I get round this by passing in a shared_ptr
to the output vector and manipulating that. And hey presto, it fires up on the cores!
我想这是无意义的现在使用futures,因为我没有回报,我会使用线程, nope !,使用无返回的线程也使用一个核心。
I figured it was pointless now using futures as I don't have a return and I'd use threads instead, nope!, using threads with no returns also uses one core. Weird eh?
很好,回到使用期货,只是把一个回到一个扔掉或什么。你猜到了,甚至从线程返回一个int将程序粘到一个核心。除非futures不能有void lambda函数。所以我的解决方案是传递一个指针来存储输出,一个int lambda函数从不返回任何东西。是的感觉像胶带,但我看不到一个更好的解决方案。
Fine, go back to using futures, just return an into to throw away or something. Yep you guessed it, even returning an int from the thread sticks the program to one core. Except futures can't have void lambda functions. So my solution is to pass a pointer in to store the output, to an int lambda function that never returns anything. Yeah it feels like duct tape, but I can't see a better solution.
这似乎是... bizzare?像编译器不知怎么解释lambda不正确。这可能是因为我使用的dev版本的LLVM而不是一个稳定的分支...?
It seems so...bizzare? Like the compiler is somehow interpreting the lambda incorrectly. Could it be because I use the dev release of LLVM and not a stable branch...?
无论如何我的解决方案,因为我讨厌什么比在这里发现我的问题没有回答:
Anyway my solution, because I hate nothing more than finding my problm on here and having no answer:
auto NThreads = 4;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));
auto Futures = std::vector<std::future<int>>(NThreads);
auto OutputVectors =
std::vector<std::shared_ptr<std::vector<unsigned __int128>>>(
NThreads, std::make_shared<std::vector<unsigned __int128>>());
for (auto I = 0; I < NThreads; ++I) {
unsigned __int128 Min = I * BlockSize;
unsigned __int128 Max = I * BlockSize + BlockSize;
if (I == NThreads - 1)
Max = PathCountLength;
Futures[I] = std::async(
std::launch::async,
[](unsigned __int128 WMin, unsigned __int128 Min, unsigned __int128 Max,
std::vector<unsigned __int128> ZeroChildren,
std::vector<unsigned __int128> OneChildren,
unsigned __int128 PathCountLength,
std::shared_ptr<std::vector<unsigned __int128>> OutputVector)
-> int {
for (unsigned __int128 I = Min; I < Max; ++I) {
OutputVector->push_back(KneeParallel::pathCountOrStatic(
WMin, I, ZeroChildren, OneChildren, PathCountLength));
}
},
WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength,
OutputVectors[I]);
}
for (auto &Future : Futures) {
Future.get();
}
推荐答案
async,你可以配置它运行deferred( std :: launch :: deferred
),在自己的线程中运行( std :: launch: :async
),或者让系统在两个选项之间决定( std :: launch :: async | std :: launch :: deferred
)。后者是默认行为。
By providing a first argument to async, you can configure it to run deferred (std::launch::deferred
), to run in its own thread (std::launch::async
), or let the system decide between both options (std::launch::async | std::launch::deferred
). The latter is the default behavior.
因此,为了强制它在另一个线程中运行,调整 std :: async
到 std :: async(std :: launch :: async,/*...*/)
。
So, to force it to run in another thread, adapt your call of std::async
to std::async(std::launch::async, /*...*/)
.
这篇关于C ++ 11 async只使用一个核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!