本文介绍了使用SFML的带有生命游戏可视化功能的OpenMP的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我正在尝试比较生命游戏"的串行和并行版本之间的速度.我使用SFML库来可视化这样的生活游戏. SFML窗口串行逻辑很简单,如下所示.

Hello I'm trying to compare speeds between serial and parallel version of 'Game of Life'.I used SFML library to visualize game of life like this.SFML windowSerial logic is simple like below.

for (int i = 0; i < height; i++) {
            for (int j = 0; j < width; j++) {
                int neighbor = 0;

                // check 8 cells around.
                // 1 2 3  -1
                // 4   5  0
                // 6 7 8  +1

                // (1)
                if (gamefieldSerial.isAvailableCell(UP(i), LEFT(j))) {
                    if(gamefieldSerial[UP(i)][LEFT(j)] == LIVE) neighbor++;
                }
                // (2)
                if (gamefieldSerial.isAvailableCell(UP(i), j)) {
                    if (gamefieldSerial[UP(i)][j] == LIVE)      neighbor++;
                }
                // (3)
                if (gamefieldSerial.isAvailableCell(UP(i), RIGHT(j))) {
                    if (gamefieldSerial[UP(i)][RIGHT(j)] == LIVE)   neighbor++;
                }
                // (4)
                if (gamefieldSerial.isAvailableCell(i, LEFT(j))) {
                    if (gamefieldSerial[i][LEFT(j)] == LIVE)        neighbor++;
                }
                // (5)
                if (gamefieldSerial.isAvailableCell(i, RIGHT(j))) {
                    if (gamefieldSerial[i][RIGHT(j)] == LIVE)       neighbor++;
                }
                // (6)
                if (gamefieldSerial.isAvailableCell(DOWN(i), LEFT(j))) {
                    if (gamefieldSerial[DOWN(i)][LEFT(j)] == LIVE)  neighbor++;
                }
                // (7)
                if (gamefieldSerial.isAvailableCell(DOWN(i), j)) {
                    if (gamefieldSerial[DOWN(i)][j] == LIVE)        neighbor++;
                }
                // (8)
                if (gamefieldSerial.isAvailableCell(DOWN(i), RIGHT(j))) {
                    if (gamefieldSerial[DOWN(i)][RIGHT(j)] == LIVE) neighbor++;
                }

                // -- Rule of Game of Life
                // Cell borns when exactly 3 neighbor is LIVE
                // Cell remains alive when 2 or 3 neighbor is LIVE
                // Cell with more than 3 neighbor dies with overpopulation
                // Cell with less than 2 neighbor dies with underpopulation
                if (gamefieldSerial[i][j] == DEAD) {
                    if (neighbor == 3) {
                        gamefieldSerial[i][j] = LIVE;
                    }
                }
                else if (gamefieldSerial[i][j] == LIVE) {
                    if (neighbor < 2 || neighbor > 3) {
                        gamefieldSerial[i][j] = DEAD;
                    }
                }
            }

在100代的768 * 256个单元上花费了3940毫秒.但是在并行版本中,我实现如下:

It took 3940ms on 768*256 cells with 100 generations.But in parallel version I implemented like below,

#pragma omp parallel for num_threads(4)
        for (int t = 0; t < width * height; t++) {
            int i = t / width;
            int j = t % width;
            int neighbor = 0;

            // check 8 cells around.
            // 1 2 3  -1
            // 4   5  0
            // 6 7 8  +1

            // (1)
            if (gamefieldParallel.isAvailableCell(UP(i), LEFT(j))) {
                if (gamefieldParallel[UP(i)][LEFT(j)] == LIVE) neighbor++;
            }
            // (2)
            if (gamefieldParallel.isAvailableCell(UP(i), j)) {
                if (gamefieldParallel[UP(i)][j] == LIVE)      neighbor++;
            }
            // (3)
            if (gamefieldParallel.isAvailableCell(UP(i), RIGHT(j))) {
                if (gamefieldParallel[UP(i)][RIGHT(j)] == LIVE)   neighbor++;
            }
            // (4)
            if (gamefieldParallel.isAvailableCell(i, LEFT(j))) {
                if (gamefieldParallel[i][LEFT(j)] == LIVE)        neighbor++;
            }
            // (5)
            if (gamefieldParallel.isAvailableCell(i, RIGHT(j))) {
                if (gamefieldParallel[i][RIGHT(j)] == LIVE)       neighbor++;
            }
            // (6)
            if (gamefieldParallel.isAvailableCell(DOWN(i), LEFT(j))) {
                if (gamefieldParallel[DOWN(i)][LEFT(j)] == LIVE)  neighbor++;
            }
            // (7)
            if (gamefieldParallel.isAvailableCell(DOWN(i), j)) {
                if (gamefieldParallel[DOWN(i)][j] == LIVE)        neighbor++;
            }
            // (8)
            if (gamefieldParallel.isAvailableCell(DOWN(i), RIGHT(j))) {
                if (gamefieldParallel[DOWN(i)][RIGHT(j)] == LIVE) neighbor++;
            }

            // -- Rule of Game of Life
            // Cell borns when exactly 3 neighbor is LIVE
            // Cell remains alive when 2 or 3 neighbor is LIVE
            // Cell with more than 3 neighbor dies with overpopulation
            // Cell with less than 2 neighbor dies with underpopulation
            if (gamefieldParallel[i][j] == DEAD) {
                if (neighbor == 3) {
                    gamefieldParallel[i][j] = LIVE;
                }
            }
            else if (gamefieldParallel[i][j] == LIVE) {
                if (neighbor < 2 || neighbor > 3) {
                    gamefieldParallel[i][j] = DEAD;
                }
            }
        }

在相同环境下花费了5746毫秒.我认为在for循环中应用openMP的'for'指令可以提高性能,但事实并非如此.我是否应该采用其他方式?

It took 5746ms on same environment.I thought applying openMP's 'for' directive in for-loop enhances the performance, but it doesn't.Should I have to approach in another way?

==============gamefieldParallel和gamefieldSerial都是GameField类的实例,该类已为单元动态分配了int **字段变量.我正在使用运算符重载来访问它,就像二维数组一样.(对不起,英语不好!)

=============Both gamefieldParallel and gamefieldSerial is instance of GameField class which has dynamically allocated int** field variable for cells. I'm using operator overloading to access it like two dimensional array.(Sorry for bad english!)

推荐答案

在串行版本和并行版本中,您都在迭代游戏场时对其进行更新.这是两个版本的错误.假设您计算gameField[0][0]的新状态,然后对其进行更新.现在,您进入gameField[0][1] ---作为计算其新状态的一部分,您向左看向gameField[0][0],该位置已经包含该单元格的新更新状态,但是必须将生命游戏的规则应用于第一个单元格的上一个状态.

In both your serial version and your parallel version you are updating the game field as you iterate through it. This is a bug with both versions. Say you calculate the new state for gameField[0][0] and then update it. Now you move onto gameField[0][1] --- as part of calculating its new state, you look left to gameField[0][0] which already contains that cell's newly-updated state, but the Game of Life's rules have to be applied to the first cell's previous state.

换句话说,您应该有一个只读(const)oldGameField,然后将新状态填充到newGameField中.计算完所有单元格后,就可以将新字段用作下一个游戏迭代的旧字段.

In other words, you should have a read-only (const) oldGameField and then fill new states into newGameField. Once you have calculated all cells, you can use the new field as the old field for the next game iteration.

修复该错误实际上是解决性能问题的重要部分.

Fixing that bug is actually an important part of sorting out your performance problem.

想象没有4个处理器进行这些更新,而是想象4个人使用铅笔和纸进行此更新.因为您现在将oldGameField视为只读,所以可以安全地复印旧页面并将副本提供给每个人.每个人都知道没有其他人会更改旧页面,因此他们不必担心自己的副本过时.

Instead of thinking of 4 processors doing these updates, imagine 4 people doing it with pencil and paper. Because you will now treat oldGameField as read-only, it is safe to photocopy the old page and give a copy to each person. Each person knows that no-one else is going to change the old page, so they don't have to worry about their copy ever becoming out of date.

但是您只能在newGameField页上打印一页.在您的串行方式中,只有一个人,他们自己拥有页面和铅笔.现在,您有四个人试图同时在同一页上绘制.他们浪费的时间比花在计算上的时间还要多!从字面上看,完成一项工作需要四个人比一个人独自完成需要的时间更长.

But you sill only have one page for the newGameField. In your serial approach, there is only one person, who has the page and the pencil exclusively to themselves. Now you have four people trying to draw on the same page at the same time. They waste more time passing the pencil and page amongst themselves than the time they spend on doing the calculations!It literally takes four people longer to do the job than one person could have done it alone.

这并不是要确切地表示硬件内部发生的情况,但是当我们考虑到OpenMP可能正在使用任何锁定和/或原子时,以及诸如处理器内核中的内存缓存之类的东西时,它就非常接近了.

This is not meant to be an exact representation of what's going on inside hardware, but when we consider any locking and/or atomics OpenMP might be using, and things like the memory caches in your processor's cores --- it's pretty close.

您和您的三个朋友可以决定,每个人都将更新四分之一的字段.也许您占据了整个董事会的前四分之一.下一个人占据第二个季度,依此类推.而不是只争一张纸来画出新的游戏领域,而是每个人都有自己的纸来画出新州的四分之一.

You and your three friends could decide that each of you would update one quarter of the field. Maybe you take the whole top quarter of the board. The next person takes the second quarter, and so on. And rather than fighting over one sheet of paper to draw the new game field, you each have your own piece of paper to draw just your quarter of the new states.

完成所有操作后,您便迅速将四张纸粘贴在一起以制作新的游戏领域页面.

Once you are all done, you quickly paste your four pieces of paper together to make the new game field page.

此处的要点是确保每个线程都从内存中读取无人更改的内容,并且确保每个线程仅写入没有其他线程正在访问的内存.这意味着内核之间不会因内存写入而相互阻塞,并且在看到其他内核写入共享内存时也不必刷新其缓存.

The point here is to make sure that each thread is reading from memory that no-one is changing, and that each thread writes only to memory that no other thread is accessing. This means that cores do not block each other with memory writes, and do not have to flush their caches when they see other cores writing to shared memory.

确保每个线程的新游戏场内存距离不够近,不会对其他线程使用的内存造成干扰,这很棘手.通常,您需要了解一些有关内核中缓存行大小的信息,以及您的平台是否使用了称为"NUMA"的信息,等等.

Making sure that each thread's new game field memory isn't close enough to cause interference to the memory used by some other thread is tricky. You generally need to know some information on the size of the cache lines in your cores, whether or not your platform use something called "NUMA", etc etc.

我不了解OpenMP,但也许它具有一些内置的支持来对此提供帮助.

I don't know OpenMP - but perhaps it has some built-in support to help with this.

  • 将旧状态与新状态分开
  • 您的线程可以愉快地共享旧状态(因为在迭代过程中没有人对其进行更改)
  • 将工作分成多个块-每个线程一个
  • 为每个线程分配自己的内存以存储其结果.
  • 所有线程完成后,您是否需要主线程将所有结果汇总到一个合并的新游戏域中.

这篇关于使用SFML的带有生命游戏可视化功能的OpenMP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-30 04:18