我在程序中遇到麻烦,该程序将首先在另一个数据框中查找日期,然后沿行插入某个值。
问题:
让原始数据帧看起来像这样:
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
想法是程序首先应在B中找到与“日期”匹配的行,然后使用列名作为x_value并将行中的值作为y_value进行插值。
输出应如下所示:
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6], "interp":[0.95,0.25, 0.75]})
到目前为止,我的方法:
import pandas as pd
from scipy.interpolate import interp1d
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
# Define x as the names of the columns
x_value = (1,3,5,7)
#Define the interpolation function as follows
def interp(row):
idx = B[B['date'] == row['date']].index.tolist()[0] #get indx from B
z_value = [] #get values from row in B
for i in range(1,5):
z_value.append(float(B.iloc[idx][i]))
tuple(z_value)
f_linear = interp1d(x_value,z_value) #define interpolation function
y_il = f_linear(row['value'])
return y_il
最后,我将函数以这种方式应用于每行:
A['interp']=A.apply(interp, axis=1)
我得到以下输出。有一个更好的方法吗??
>>> A
date interp value
0 06/24/2014 0.95 2
1 06/25/2014 0.25 4
2 06/26/2014 0.75 6
最佳答案
如果您真的只想要选择值,那么它将提供给您。注意,我利用了groupby
函数,因此每个scipy.interpolate.interp1d
只需创建一次date
调用。
数据处理:
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"],
"1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)
然后实际工作
from scipy.interpolate import interp1d
import pandas as pd
def interped(series,targets):
x,y = zip(*series.items())
f = interp1d(x,y)
return [(i,f(i)) for i in targets]
def getResults(dfA, dfB):
grouped = dfA.groupby('date')
res = []
for key in grouped.groups:
targets = grouped.get_group(key)['value'].values
values = interped(dfB[key], targets)
res.extend([(key, target, value) for target,value in values])
return pd.DataFrame(res, columns=["date", "target", "interp"])
getResults(A, B)
输出:
date target interp
0 06/24/2014 2 0.95
1 06/26/2014 6 0.75
2 06/25/2014 4 0.25
并且,如果您坚持要调用
A.apply
...。import pandas as pd
from scipy.interpolate import interp1d
A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"],
"1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)
def getRowApplyFunc():
funcs = {}
def interped(row):
date = row['date']
target = row['value']
if date in funcs:
interpFunc = funcs[date]
else:
x,y = zip(*B[date].items())
interpFunc = interp1d(x,y)
funcs[date] = interpFunc
return interpFunc(target)
return interped
A['interpd'] = A.apply(getRowApplyFunc(), axis=1)
A
还输出:
date value interpd
0 06/24/2014 2 0.95
1 06/25/2014 4 0.25
2 06/26/2014 6 0.75