问题描述
我有一个要在python中实现的算法.该算法可能已执行1.000.000次,因此我想尽可能地对其进行优化.该算法的基础是三个列表(energy
,point
和valList
)以及两个计数器p
和e
.
两个列表energy
和point
包含0到1之间的数字,这些数字是我进行决策的依据. p
是点计数器,而e
是能量计数器.我可以用点数换取Enery,每种能量的成本在valList
中定义(取决于时间).反之,我也可以交易.但是我必须一次全部交易.
算法概述:
- 获取一个布尔列表,其中
energy
中的元素高于某个阈值,而point
中的元素低于另一个阈值.这是用能量换取积分的决定.获取相应的点列表,从而决定能源的交易点 - 在每个布尔列表中.删除所有接在另一个真实值之后的真实值(如果我将所有点都换成能源,则不允许我再重复一点)
- 对于两个布尔列表中的每个项对(
pB
,点布尔值和eB
,能量布尔值):如果pB为true并且我有点,我想将我所有的点都换成enery.如果eB
是真实的并且我有能量,我想将我所有的能量都兑换成点.
这是我提出的实现:
start = time.time()
import numpy as np
np.random.seed(2) #Seed for deterministic result, just for debugging
topLimit = 0.55
bottomLimit = 0.45
#Generate three random arrays, will not be random in the real world
res = np.random.rand(500,3) #Will probably not be much longer than 500
energy = res[:,0]
point = res[:,1]
valList = res[:,2]
#Step 1:
#Generate two bools that (for ex. energy) is true when energy is above a threashold
#and point below another threshold). The opposite applies to point
energyListBool = ((energy > topLimit) & (point < bottomLimit))
pointListBool = ((point > topLimit) & (energy < bottomLimit))
#Step 2:
#Remove all 'true' that comes after another true since this is not valid
energyListBool[1:] &= energyListBool[1:] ^ energyListBool[:-1]
pointListBool[1:] &= pointListBool[1:] ^ pointListBool[:-1]
p = 100
e = 0
#Step 3:
#Loop through the lists, if point is true, I loose all p but gain p/valList[i] for e
#If energy is true I loose all e but gain valList[i]*e for p
for i in range(len(energyListBool)):
if pointListBool[i] and e == 0:
e = p/valList[i] #Trade all points to energy
p = 0
elif energyListBool[i] and p == 0:
p = valList[i]*e #Trade all enery to points
e = 0
print('p = {0} (correct for seed 2: 3.1108006690739174)'.format(p))
print('e = {0} (correct for seed 2: 0)'.format(e))
end = time.time()
print(end - start)
我正在努力的是如何(如果可以做到)对for循环进行矢量化处理,因此我可以使用它代替for循环,因为在我看来这可能会更快.
在当前问题设置之内,这是不可能的,因为矢量化本质上要求您的第n
个计算步骤不应依赖于先前的n-1
步骤.但是,有时候,有可能找到重复的f(n) = F(f(n-1), f(n-2), ... f(n-k))
的所谓闭合形式",即找到不依赖n
的f(n)
的显式表达式,但这是一个单独的研究问题. /p>
此外,从算法的角度来看,这样的矢量化不会有太大的作用,因为算法的复杂度仍然是C*n = O(n)
.但是,由于复杂性常数" C
实际上很重要,因此有不同的方法来降低它.例如,用C/C ++重写关键循环不是什么大问题.
I have an algorithm which I am implementing in python. The algorithm might be executed 1.000.000 times so I want to optimize it as much as possible. The base in the algorithm is three lists (energy
, point
and valList
) and two counters p
and e
.
The two lists energy
and point
contains number between 0 and 1 on which I base decisions. p
is a point-counter and e
is an energy-counter. I can trade points for enery, and the cost of each energy is defined in valList
(it is time dependent). I can also trade the otherway. But I have to trade all at once.
The outline of the algorithm:
- Get a boolean-list where the elements in
energy
is above a threshold and the elements inpoint
is below another threshold. This is a decision to trade energy for points. Get a corresponding list for point, which gives decision to trade points for energy - In each of the boolean-lists. Remove all true-values that comes after another true value (if i have trade all points for energy, i am not allowed to do that again point after)
- For each item-pair (
pB
, point bool andeB
, energy bool) from the two boolean lists: If pB is true and i have points, i want to trade all my points for enery. IfeB
is true and i have energy, i want to trade all my energy to points.
This is the implementation i have come up with:
start = time.time()
import numpy as np
np.random.seed(2) #Seed for deterministic result, just for debugging
topLimit = 0.55
bottomLimit = 0.45
#Generate three random arrays, will not be random in the real world
res = np.random.rand(500,3) #Will probably not be much longer than 500
energy = res[:,0]
point = res[:,1]
valList = res[:,2]
#Step 1:
#Generate two bools that (for ex. energy) is true when energy is above a threashold
#and point below another threshold). The opposite applies to point
energyListBool = ((energy > topLimit) & (point < bottomLimit))
pointListBool = ((point > topLimit) & (energy < bottomLimit))
#Step 2:
#Remove all 'true' that comes after another true since this is not valid
energyListBool[1:] &= energyListBool[1:] ^ energyListBool[:-1]
pointListBool[1:] &= pointListBool[1:] ^ pointListBool[:-1]
p = 100
e = 0
#Step 3:
#Loop through the lists, if point is true, I loose all p but gain p/valList[i] for e
#If energy is true I loose all e but gain valList[i]*e for p
for i in range(len(energyListBool)):
if pointListBool[i] and e == 0:
e = p/valList[i] #Trade all points to energy
p = 0
elif energyListBool[i] and p == 0:
p = valList[i]*e #Trade all enery to points
e = 0
print('p = {0} (correct for seed 2: 3.1108006690739174)'.format(p))
print('e = {0} (correct for seed 2: 0)'.format(e))
end = time.time()
print(end - start)
What I am struggeling with is how (if it can be done) to vectorize the for-loop, so i can use that instead of the for-loop which in my mind probably would be faster.
Within the current problem setting that's not possible since vectorization essentially requires that your n
-th computation step shouldn't depend on previous n-1
steps. Sometimes, however, it's possible to find so-called "closed form" of a recurrence f(n) = F(f(n-1), f(n-2), ... f(n-k))
, i.e. to find an explicit expression for f(n)
that doesn't depend on n
, but it's a separate research problem.
Moreover, from algorithmic point of view such a vectorization wouldn't give a lot, since complexity of your algorithm would still be C*n = O(n)
. However, since "complexity constant" C
does matter in practice, there are different ways to reduce it. For example, it shouldn't be a big problem to rewrite your critical loop in C/C++.
这篇关于向量化或优化循环,其中每次迭代都取决于前一次迭代的状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!