问题描述
假设我有一个这样的DataFrame(或Series):
Suppose I have a DataFrame (or Series) like this:
Value
0 0.5
1 0.8
2 -0.2
3 None
4 None
5 None
我希望创建一个新的结果列.
I wish to create a new Result column.
每个结果的值由上一个值通过任意函数f
确定.
The value of each result is determined by the previous Value, via an arbitrary function f
.
如果先前的值不可用(无或NaN),我希望改用先前的结果(当然,对它应用f
).
If the previous Value is not available (None or NaN), I wish to use instead the previous Result (and apply f
to it, of course).
使用上一个值很容易,我只需要使用shift
.但是,访问以前的结果似乎并不那么简单.
Using the previous Value is easy, I just need to use shift
. However, accessing the previous result doesn't seem to be that simple.
例如,以下代码计算结果,但是如果需要,则无法访问前一个结果.
For example, the following code calculates the result, but cannot access the previous result if needed.
df['Result'] = df['Value'].shift(1).apply(f)
请假定f
是任意的,因此不可能使用cumsum
之类的解决方案.
Please assume that f
is arbitrary, and thus solutions using things like cumsum
are not possible.
显然,这可以通过迭代来完成,但是我想知道是否存在更多的Panda-y解决方案.
Obviously, this can be done by iteration, but I want to know if a more Panda-y solution exists.
df['Result'] = None
for i in range(1, len(df)):
value = df.iloc[i-1, 'Value']
if math.isnan(value) or value is None:
value = df.iloc[i-1, 'Result']
df.iloc[i, 'Result'] = f(value)
示例输出,给出f = lambda x: x+1
:
坏:
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN NaN
5 NaN NaN
好:
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN 1.8 <-- previous Value not available, used f(previous result)
5 NaN 2.8 <-- same
推荐答案
好像对我来说是一个循环.而且我讨厌循环...所以当我循环时,我使用numba
Looks like it has to be a loop to me. And I abhor loops... so when I loop, I use numba
from numba import njit
@njit
def f(x):
return x + 1
@njit
def g(a):
r = [np.nan]
for v in a[:-1]:
if np.isnan(v):
r.append(f(r[-1]))
else:
r.append(f(v))
return r
df.assign(Result=g(df.Value.values))
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN 1.8
5 NaN 2.8
这篇关于 pandas 适用,但可访问先前计算的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!