希望你能帮忙,因为这件事让我头疼。
我在python(3.3)中开发了一个小型捕食者-猎物模拟,它使用一个简单的前馈神经网络。今天,我将执行大脑每一个“滴答”的函数从纯python数组改为numpy数组,以便在使用更大的大脑时对其进行优化。
我用cProfile检查了(整个主程序“循环”)的性能速度,正如我所料,“滴答”功能(在brain类中)更快。然而,当运行程序时,我注意到,实际上,使用numpy比较慢(12秒内100个循环,而9秒内100个循环)。
为什么会这样?代码如下:
原始实施:

class Brain:

def __init__(self, inputs, hidden, outputs):
    self.input_num = inputs
    self.hidden_num = hidden
    self.output_num = outputs

    self.h_weight = []
    self.o_weight = []
    for _ in range(self.input_num * self.hidden_num):
        self.h_weight.append(random.random()*2-1)
    for _ in range(self.hidden_num * self.output_num):
        self.o_weight.append(random.random()*2-1)

def tick(self):
    input_num = self.input_num
    hidden_num = self.hidden_num
    output_num = self.output_num

    hidden = [0]*hidden_num
    output = [0]*output_num

    inputs = self.input
    h_weight = self.h_weight
    o_weight = self.o_weight

    e = math.e

    count = -1
    for x in range(hidden_num):
        temp = 0
        for y in range(input_num):
            count += 1
            temp -= inputs[y] * h_weight[count]
        hidden[x] = 1/(1+e**(temp))

    count = -1
    for x in range(output_num):
        temp = 0
        for y in range(hidden_num):
            count += 1
            temp -= hidden[y] * o_weight[count]
        output[x] = 1/(1+e**(temp))

    self.output = output

新实现(使用numpy):
class Brain:

def __init__(self, inputs, hidden, outputs):
    self.input_num = inputs
    self.hidden_num = hidden
    self.output_num = outputs


    self.h_weights = random.random((self.hidden_num, self.input_num))
    self.o_weights = random.random((self.output_num, self.hidden_num))

    self.h_activation = zeros((self.hidden_num, 1), dtype=float)
    self.o_activation = zeros((self.output_num, 1), dtype=float)

    self.i_output = zeros((self.input_num, 1), dtype=float)
    self.h_output = zeros((self.hidden_num, 1), dtype=float)
    self.o_output = zeros((self.output_num, 1), dtype=float)

def tick(self):
    i_output = self.input
    h_weights = self.h_weights
    o_weights = self.o_weights

    h_activation = dot(h_weights, i_output)
    h_output = tanh(h_activation)

    o_activation = dot(o_weights, h_output)
    o_output = tanh(o_activation)
    self.output = o_output

这里是程序的主循环,我已经计时了(忽略其他函数,它们对“brain.tick()”函数没有影响)。它在另一个类中,这也是不相关的:
    def update(self):
    GUI.update()

    if not self.pause:
        self.tick += 1

        if self.tick % 1000 == 0:
            if self.globalSelection:
                self.newGeneration(self.creatures)
            else:
                for specie in self.species:
                    if not specie.isPlant:
                        creatureList = [creature for creature in self.creatures if                creature.specie == specie]
                        self.newGeneration(creatureList)


        for creature in self.creatures:
            if not creature.specie.isPlant:
                if self.useHunger and creature.hunger < 1:
                    creature.hunger += 1/240
                creature.setInputs()
                creature.brain.tick()
                creature.move(creature.brain.output[0]*50-25, creature.brain.output[1]*8)
                creature.interactions()

    threading.Timer(self.tickrate, self.update).start()

现在大脑被设置为5个输入,200个隐藏,2个输出,只是为了测试速度。
以下是cProfile的结果:
原件:
         3312 function calls in 0.094 seconds



Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.094    0.094 <string>:1(<module>)
      200    0.006    0.000    0.007    0.000 __init__.py:263(move)
      200    0.024    0.000    0.024    0.000 __init__.py:286(setInputs)
      200    0.001    0.000    0.001    0.000 __init__.py:360(interactions)
      200    0.033    0.000    0.033    0.000 __init__.py:415(tick)
        1    0.005    0.005    0.094    0.094 __init__.py:46(update)
        1    0.007    0.007    0.020    0.020 __init__.py:471(update)
        1    0.000    0.000    0.000    0.000 _weakrefset.py:79(add)
        3    0.000    0.000    0.000    0.000 threading.py:127(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:160(_release_save)
        1    0.000    0.000    0.000    0.000 threading.py:163(_acquire_restore)
        1    0.000    0.000    0.000    0.000 threading.py:166(_is_owned)
        1    0.000    0.000    0.000    0.000 threading.py:175(wait)
        2    0.000    0.000    0.000    0.000 threading.py:297(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:305(is_set)
        1    0.000    0.000    0.000    0.000 threading.py:325(wait)
        1    0.000    0.000    0.000    0.000 threading.py:507(_newname)
        1    0.000    0.000    0.000    0.000 threading.py:534(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:577(start)
        1    0.000    0.000    0.000    0.000 threading.py:775(daemon)
        1    0.000    0.000    0.000    0.000 threading.py:810(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:887(current_thread)
      225    0.002    0.000    0.002    0.000 {built-in method aacircle}
      200    0.001    0.000    0.001    0.000 {built-in method aaline}
      200    0.000    0.000    0.000    0.000 {built-in method abs}
        4    0.000    0.000    0.000    0.000 {built-in method allocate_lock}
      200    0.001    0.000    0.001    0.000 {built-in method atan2}
      400    0.001    0.000    0.001    0.000 {built-in method cos}
        1    0.000    0.000    0.094    0.094 {built-in method exec}
      225    0.001    0.000    0.001    0.000 {built-in method filled_circle}
        4    0.000    0.000    0.000    0.000 {built-in method filled_polygon}
        1    0.000    0.000    0.000    0.000 {built-in method get_ident}
        1    0.000    0.000    0.000    0.000 {built-in method get_pressed}
        1    0.000    0.000    0.000    0.000 {built-in method get}
        2    0.000    0.000    0.000    0.000 {built-in method len}
      200    0.000    0.000    0.000    0.000 {built-in method radians}
        1    0.000    0.000    0.000    0.000 {built-in method round}
      400    0.001    0.000    0.001    0.000 {built-in method sin}
        1    0.000    0.000    0.000    0.000 {built-in method start_new_thread}
        1    0.006    0.006    0.006    0.006 {built-in method update}
        5    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.lock' objects}
        1    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        1    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 {method 'blit' of 'pygame.Surface' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.001    0.001    0.001    0.001 {method 'fill' of 'pygame.Surface' objects}
        8    0.000    0.000    0.000    0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
        2    0.000    0.000    0.000    0.000 {method 'release' of '_thread.lock' objects}
        3    0.000    0.000    0.000    0.000 {method 'render' of 'pygame.font.Font' objects}

新的:
                3322 function calls in 0.068 seconds



 Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.068    0.068 <string>:1(<module>)
      200    0.005    0.000    0.006    0.000 __init__.py:265(move)
      200    0.022    0.000    0.023    0.000 __init__.py:288(setInputs)
      200    0.001    0.000    0.001    0.000 __init__.py:362(interactions)
      200    0.005    0.000    0.014    0.000 __init__.py:417(tick)
        1    0.005    0.005    0.068    0.068 __init__.py:47(update)
        1    0.005    0.005    0.019    0.019 __init__.py:473(update)
        1    0.000    0.000    0.000    0.000 _weakrefset.py:79(add)
        3    0.000    0.000    0.000    0.000 threading.py:127(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:160(_release_save)
        1    0.000    0.000    0.000    0.000 threading.py:163(_acquire_restore)
        1    0.000    0.000    0.000    0.000 threading.py:166(_is_owned)
        1    0.000    0.000    0.000    0.000 threading.py:175(wait)
        2    0.000    0.000    0.000    0.000 threading.py:297(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:305(is_set)
        1    0.000    0.000    0.000    0.000 threading.py:325(wait)
        1    0.000    0.000    0.000    0.000 threading.py:507(_newname)
        1    0.000    0.000    0.000    0.000 threading.py:534(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:577(start)
        1    0.000    0.000    0.000    0.000 threading.py:775(daemon)
        1    0.000    0.000    0.000    0.000 threading.py:810(__init__)
        1    0.000    0.000    0.000    0.000 threading.py:887(current_thread)
      225    0.002    0.000    0.002    0.000 {built-in method aacircle}
      200    0.001    0.000    0.001    0.000 {built-in method aaline}
      200    0.001    0.000    0.001    0.000 {built-in method abs}
        4    0.000    0.000    0.000    0.000 {built-in method allocate_lock}
      200    0.000    0.000    0.000    0.000 {built-in method atan2}
      400    0.001    0.000    0.001    0.000 {built-in method cos}
      400    0.008    0.000    0.008    0.000 {built-in method dot}
        1    0.000    0.000    0.068    0.068 {built-in method exec}
      225    0.001    0.000    0.001    0.000 {built-in method filled_circle}
        4    0.000    0.000    0.000    0.000 {built-in method filled_polygon}
        1    0.000    0.000    0.000    0.000 {built-in method get_ident}
        1    0.000    0.000    0.000    0.000 {built-in method get_pressed}
        1    0.000    0.000    0.000    0.000 {built-in method get}
        2    0.000    0.000    0.000    0.000 {built-in method len}
      200    0.000    0.000    0.000    0.000 {built-in method radians}
        1    0.000    0.000    0.000    0.000 {built-in method round}
      400    0.001    0.000    0.001    0.000 {built-in method sin}
        1    0.000    0.000    0.000    0.000 {built-in method start_new_thread}
        1    0.005    0.005    0.005    0.005 {built-in method update}
        5    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.lock' objects}
        1    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        1    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 {method 'blit' of 'pygame.Surface' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.002    0.002    0.002    0.002 {method 'fill' of 'pygame.Surface' objects}
       18    0.000    0.000    0.000    0.000 {method 'random_sample' of 'mtrand.RandomState' objects}
        2    0.000    0.000    0.000    0.000 {method 'release' of '_thread.lock' objects}
        3    0.000    0.000    0.000    0.000 {method 'render' of 'pygame.font.Font' objects}

如您所见,整个update()函数(模拟的核心)和brain.tick()看起来要快得多。那么,为什么在运行程序时速度会变慢呢?
干杯。

最佳答案

在新的实现中,创建5个numpy数组,每个Brain对象:

self.h_activation = zeros((self.hidden_num, 1), dtype=float)
self.o_activation = zeros((self.output_num, 1), dtype=float)

self.i_output = zeros((self.input_num, 1), dtype=float)
self.h_output = zeros((self.hidden_num, 1), dtype=float)
self.o_output = zeros((self.output_num, 1), dtype=float)

这些属性在代码的其他部分中不被引用。创建它们是一个潜在的代价高昂的操作,在原始实现中似乎没有直接的对应项。我不确定它是否会超过更快的numpy计算的速度优势,但是如果你正在创建很多Brain对象,这是值得一看的。

关于python - 它应该更快,cProfile说它更快,但是程序实际上运行得更慢,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/17778241/

10-14 10:58
查看更多