问题描述
我有一个MultiIndex(Name
,Date
)DataFrame df
,我需要按Date
进行迭代处理,以便分配基于当前日期和上一个Date的组的值.
I have a MultiIndex (Name
, Date
) DataFrame df
that I need to process iteratively by Date
in order to assign a value that is based on both the current and previous Date's Group.
AFAIK处理DataFrame组的最佳方法是通过.apply
–例如df.groupby('Date').apply(ifunc)
.
AFAIK the best way to process DataFrame Groups is by .apply
– e.g., df.groupby('Date').apply(ifunc)
.
但是当ifunc
需要在ifunc
处理完先前的日期组之后引用前一个日期组中的值时,如何做呢?
But how can I best do this when ifunc
needs to reference the values from the previous Date Group after that previous Group has been processed by ifunc
?
以下是这样的ifunc
的示例,该ifunc
可以在具有列['Dollars', 'Weight', 'Return', 'HaveMax']
的df
上进行操作:
Here is an example of such an ifunc
to operate on df
with columns ['Dollars', 'Weight', 'Return', 'HaveMax']
:
# (This might not be great python; coding improvements welcome!)
# Lambda to add "AddDollars" to Names that don't already "HaveMax" "MaxDollars"
def ifunc(group, previous): # Arguments are df groups by Date
group['HaveMax'] = previous['HaveMax']
# Each Name's Dollars changed from the previous Date
avgWeights = group['Weight'].mean()
group['Dollars'] = group['Weight'] * previous['Dollars'] * group['Return'] / avgWeights
# Now add "AddDollars" to Names that were under
group.loc[group['HaveMax'] == False, 'Dollars'] = group[group['HaveMax'] == False]['Dollars'] + AddDollars
# Update HaveMax for any Names that reached MaxDollars on this Date
group.loc[group['HaveMax'] == False, 'HaveMax'] = group[group['HaveMax'] == False]['Dollars'] >= MaxDollars
return group
样本数据:
Sample data:
AddDollars = 1.0
MaxDollars = 10.0
df = pd.DataFrame(data=[('A', '20210101', 9.0, 1.0, 0, False),
('B', '20210101', 5.0, 1.0, 0, False),
('C', '20210101', 5.0, 1.0, 0, True),
('A', '20210102', 0.0, 1.0, 1.0, False),
('B', '20210102', 0.0, 1.0, 1.0, False),
('C', '20210102', 0.0, 1.0, 1.0, False)],
columns=('Name', 'Date', 'Dollars', 'Weight', 'Return', 'HaveMax')).set_index(['Name', 'Date'])
所需的输出:
Dollars Weight Return HaveMax
Name Date
A 20210101 9.0 1.0 0.0 False
B 20210101 5.0 1.0 0.0 False
C 20210101 5.0 1.0 0.0 True
A 20210102 10.0 1.0 1.0 True
B 20210102 6.0 1.0 1.0 False
C 20210102 5.0 1.0 1.0 True
推荐答案
使用groupby
遍历组.
AddDollars = 1.0
MaxDollars = 10.0
df = pd.DataFrame(data=[('A', '20210101', 9.0, 1.0, 0, False),
('B', '20210101', 5.0, 1.0, 0, False),
('C', '20210101', 5.0, 1.0, 0, True),
('A', '20210102', 0.0, 1.0, 1.0, False),
('B', '20210102', 0.0, 1.0, 1.0, False),
('C', '20210102', 0.0, 1.0, 1.0, False)],
columns=('Name', 'Date', 'Dollars', 'Weight', 'Return', 'HaveMax')).set_index(['Name', 'Date'])
dft = df.groupby(df.index.get_level_values('Date'))
groupings = list(dft.groups.keys())
previous = dft.get_group(groupings[0])
for i, gkey in enumerate(groupings[1:], 1):
group = dft.get_group(gkey)
group['HaveMax'] = previous['HaveMax'].values
avgWeights = group['Weight'].mean()
group['Dollars'] = group['Weight'].values * previous['Dollars'].values * group['Return'].values / avgWeights
group.loc[group['HaveMax'] == False, 'Dollars'] = group[group['HaveMax'] == False]['Dollars'] + AddDollars
group.loc[group['HaveMax'] == False, 'HaveMax'] = group[group['HaveMax'] == False]['Dollars'] >= MaxDollars
# Assign the calculated values back to the DataFrame:
df.loc[group.index] = group
# Prepare for next iteration:
previous = group
这篇关于参照先前的组依次遍历DataFrame日期组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!