我安装了Visual Studio 2012,并安装了Intel parallel studio 2013,所以我有Intel TBB。
说我有以下代码:
const int cardsCount = 12; // will be READ by all threads
// the required number of cards of each colour to complete its set:
// NOTE that the required number of cards of each colour is not the same as the total number of cards of this colour available
int required[] = {2,3,4}; // will be READ by all threads
Card cards[cardsCount]; // will be READ by all threads
int cardsIndices[cardsCount];// this will be permuted, permutations need to be split among threads !
// set "cards" to 4 cards of each colour (3 colours total = 12 cards)
// set cardsIndices to {0,1,2,3...,11}
// this variable will be written to by all threads, maybe have one for each thread and combine them later?? or can I use concurrent_vector<int> instead !?
int logColours[] = {0,0,0};
int permutationsCount = fact(cardsCount);
for (int pNum=0; pNum<permutationsCount; pNum++) // I want to make this loop parallel !!
{
int countColours[3] = {0,0,0}; // local loop variable, no problem with multithreading
for (int i=0; i<cardsCount; i++)
{
Card c = cards[cardsIndices[i]]; // accessed "cards"
countColours[c.Colour]++; // local loop variable, np.
// we got the required number of cards of this colour to complete it
if (countColours[c.Colour] == required[c.Colour]) // read global variable "required" !
{
// log that we completed this colour and go to next permutation
logColours[c.Colour] ++; // should I use a concurrent_vector<int> for this shared variable?
break;
}
}
std::next_permutation(cardsIndices, cardsIndices+cardsCount); // !! this is my main issue
}
我正在计算的是,如果我们从可用的卡片中随机选择,我们将完成一次颜色的次数,这是通过仔细检查每个可能的排列并顺序选择来彻底完成的,当颜色“完成”时,我们中断并转到下一个排列。请注意,我们有每种颜色的4张卡片,但完成每种颜色所需的卡片数目是{2,3,4}(红色,绿色和蓝色)。 2张红牌足以完成红色,而我们有4张可用,因此红色比蓝色更容易完成,而蓝色需要选择全部4张卡。
我想使这个for循环并行,但是我的主要问题是如何处理“卡片”排列?您这里有约5亿个排列(12个!),如果我有4个线程,如何将其划分为4个不同的季度,并让每个线程都通过它们?
如果我不知道计算机具有的内核数,并且希望程序自动选择正确的并发线程数怎么办?肯定有一种使用英特尔或微软工具做到这一点的方法吗?
这是我的Card结构,以防万一:
struct Card
{
public:
int Colour;
int Symbol;
}
最佳答案
让N = cardsNumber
,M = required[0] * required[1] * ... * required[maxColor]
。
然后,实际上,可以在O(N * M)时间内轻松解决您的问题。在您的情况下,这就是12 * 2 * 3 * 4 = 288
操作。 :)
执行此操作的一种可能方法是使用递归关系。
考虑一个函数logColours f(n, required)
。令n
为当前已考虑的卡数; required
是您示例中的 vector 。函数以 vector logColours
返回答案。
您对f(12, {2,3,4})
感兴趣。函数f
中的简短递归计算可以这样写:
std::vector<int> f(int n, std::vector<int> require) {
if (cache[n].count(require)) {
// we have already calculated function with same arguments, do not recalculate it again
return cache[n][require];
}
std::vector<int> logColours(maxColor, 0); // maxColor = 3 in your example
for (int putColor=0; putColor<maxColor; ++putColor) {
if (/* there is still at least one card with color 'putColor'*/) {
// put a card of color 'putColor' on place 'n'
if (require[putColor] == 1) {
// means we've reached needed amount of cards of color 'putColor'
++logColours[putColor];
} else {
--require[putColor];
std::vector<int> logColoursRec = f(n+1, require);
++require[putColor];
// merge child array into your own.
for (int i=0; i<maxColor; ++i)
logColours[i] += logColoursRec[i];
}
}
}
// store logColours in a cache corresponding to this function arguments
cache[n][required] = std::move(logColours);
return cache[n][required];
}
缓存可以实现为
std::unordered_map<int, std::unordered_map<std::vector<int>, std::vector<int>>>
。一旦了解了主要思想,就可以用更高效的代码来实现它。