我在程序中遇到麻烦,该程序将首先在另一个数据框中查找日期,然后沿行插入某个值。

问题:
让原始数据帧看起来像这样:

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})

B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})


想法是程序首先应在B中找到与“日期”匹配的行,然后使用列名作为x_value并将行中的值作为y_value进行插值。

输出应如下所示:

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6], "interp":[0.95,0.25, 0.75]})




到目前为止,我的方法:

import pandas as pd
from scipy.interpolate import interp1d

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})

B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"], "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})

# Define x as the names of the columns
x_value = (1,3,5,7)

#Define the interpolation function as follows

def interp(row):
    idx = B[B['date'] == row['date']].index.tolist()[0] #get indx from B
    z_value = [] #get values from row in B
    for i in range(1,5):
        z_value.append(float(B.iloc[idx][i]))
    tuple(z_value)
    f_linear = interp1d(x_value,z_value) #define interpolation function
    y_il = f_linear(row['value'])
    return y_il


最后,我将函数以这种方式应用于每行:

A['interp']=A.apply(interp, axis=1)


我得到以下输出。有一个更好的方法吗??

>>> A
         date interp  value
0  06/24/2014   0.95      2
1  06/25/2014   0.25      4
2  06/26/2014   0.75      6

最佳答案

如果您真的只想要选择值,那么它将提供给您。注意,我利用了groupby函数,因此每个scipy.interpolate.interp1d只需创建一次date调用。

数据处理:

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"],
                  "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)


然后实际工作

from scipy.interpolate import interp1d
import pandas as pd

def interped(series,targets):
    x,y = zip(*series.items())
    f = interp1d(x,y)
    return [(i,f(i)) for i in targets]

def getResults(dfA, dfB):
    grouped = dfA.groupby('date')
    res = []
    for key in grouped.groups:
        targets = grouped.get_group(key)['value'].values
        values = interped(dfB[key], targets)
        res.extend([(key, target, value) for target,value in values])

    return pd.DataFrame(res, columns=["date", "target", "interp"])

getResults(A, B)


输出:

    date    target  interp
0   06/24/2014  2   0.95
1   06/26/2014  6   0.75
2   06/25/2014  4   0.25




并且,如果您坚持要调用A.apply...。

import pandas as pd
from scipy.interpolate import interp1d

A = pd.DataFrame({"date":["06/24/2014","06/25/2014","06/26/2014"], "value":[2, 4, 6]})
B = pd.DataFrame({"date":["06/25/2014","06/26/2014","06/24/2014"],
                  "1":[0.1, 0.5, 0.9],"3":[0.2, 0.6, 1.0],"5":[0.3, 0.7, 1.1],"7":[0.4, 0.8, 1.2]})
B = B.set_index('date').T
B.index = B.index.astype(int)


def getRowApplyFunc():
    funcs = {}
    def interped(row):
        date = row['date']
        target = row['value']
        if date in funcs:
            interpFunc = funcs[date]
        else:
            x,y = zip(*B[date].items())
            interpFunc = interp1d(x,y)
            funcs[date] = interpFunc
        return interpFunc(target)
    return interped

A['interpd'] = A.apply(getRowApplyFunc(), axis=1)
A


还输出:

    date    value   interpd
0   06/24/2014  2   0.95
1   06/25/2014  4   0.25
2   06/26/2014  6   0.75

08-27 08:03