问题描述
假设我有一个像这样的 DataFrame(或系列):
Suppose I have a DataFrame (or Series) like this:
Value
0 0.5
1 0.8
2 -0.2
3 None
4 None
5 None
我希望创建一个新的结果列.
I wish to create a new Result column.
每个结果的值由上一个值决定,通过任意函数f
.
The value of each result is determined by the previous Value, via an arbitrary function f
.
如果前一个值不可用(无或 NaN),我希望使用前一个结果代替(当然,并对其应用 f
).
If the previous Value is not available (None or NaN), I wish to use instead the previous Result (and apply f
to it, of course).
使用前一个值很容易,我只需要使用shift
.然而,访问之前的结果似乎并没有那么简单.
Using the previous Value is easy, I just need to use shift
. However, accessing the previous result doesn't seem to be that simple.
例如,下面的代码计算结果,但如果需要,不能访问之前的结果.
For example, the following code calculates the result, but cannot access the previous result if needed.
df['Result'] = df['Value'].shift(1).apply(f)
请假设 f
是任意的,因此使用 cumsum
之类的解决方案是不可能的.
Please assume that f
is arbitrary, and thus solutions using things like cumsum
are not possible.
显然,这可以通过迭代来完成,但我想知道是否存在更像熊猫的解决方案.
Obviously, this can be done by iteration, but I want to know if a more Panda-y solution exists.
df['Result'] = None
for i in range(1, len(df)):
value = df.iloc[i-1, 'Value']
if math.isnan(value) or value is None:
value = df.iloc[i-1, 'Result']
df.iloc[i, 'Result'] = f(value)
示例输出,给定 f = lambda x: x+1
:
不好:
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN NaN
5 NaN NaN
好:
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN 1.8 <-- previous Value not available, used f(previous result)
5 NaN 2.8 <-- same
推荐答案
对我来说看起来它必须是一个循环.我讨厌循环......所以当我循环时,我使用 numba
Looks like it has to be a loop to me. And I abhor loops... so when I loop, I use numba
Numba 使您能够使用直接用 Python 编写的高性能函数来加速应用程序.只需少量注释,面向数组和数学密集型的 Python 代码就可以即时编译为本地机器指令,其性能类似于 C、C++ 和 Fortran,而无需切换语言或 Python 解释器.
from numba import njit
@njit
def f(x):
return x + 1
@njit
def g(a):
r = [np.nan]
for v in a[:-1]:
if np.isnan(v):
r.append(f(r[-1]))
else:
r.append(f(v))
return r
df.assign(Result=g(df.Value.values))
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN 1.8
5 NaN 2.8
这篇关于 pandas 适用,但访问先前计算的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!