我有一个数据框:
Date_1 Date_2 individual_count
01/09/2019 02/08/2019 2
01/09/2019 03/08/2019 2
01/09/2019 04/08/2019 2
01/09/2019 05/08/2019 2
. . .
01/09/2019 28/08/2019 10
01/09/2019 29/08/2019 11
01/09/2019 30/08/2019 12
01/09/2019 31/08/2019 14
我想生成3列num_days_2,num_days_3,num_days_5,num_days_20
我想以以下方式聚合数据集:
num_days_2 : all individual_count aggregated for date_1 for date_2 = (date_2- 2, date_2- 1)
num_days_3 : all individual_count aggregated for date_1 for date_2 = (date_2- 5, date_2- 3)
num_days_5 : all individual_count aggregated for date_1 for date_2 = (date_2- 6, date_2- 10)
num_days_20 : all individual_count aggregated for date_1 for date_2 = left all dates
例如,对于特定日期date_1 = 01/09/2019:
num_days_2 = sum of individual counts for date_2 = 30/08/2019 - 31/08/2019
num_days_3 = sum of individual counts for date_2 = 27/08/2019 - 29/08/2019
num_days_5 = sum of individual counts for date_2 = 26/08/2019 - 22/08/2019
num_days_20 = sum of individual counts for date_2 = 25/08/2019 - 02/08/2019
编辑
预期产量:
Date_1 num_days_2 num_days_3 num_days_5 num_days_20
01/09/2019
02/09/2019
.
.
.
30/09/2019
任何人都可以实现同样的目标。
最佳答案
我创建了一个可以使用的示例。您可能需要重命名列,并查看cut
函数以对垃圾箱进行正确排序。
# Generate example data.
# This is just an way go generate data that can be used to simulate your data.
df = pd.DataFrame(
data=dict(
Date_1=pd.Timestamp('today'), # This is Date_1
Date_2=pd.date_range(end=pd.Timestamp('today'), periods=25), # This is Date_2
individual_count=range(25) # This is individual_count
)
)
# Calculate an offset as integer days:
# For each day, calculate the differace in days between day Date1 and Date2
df['offset_timedelta'] = (df.Date_1 - df.Date_2)
# To make bining eaiser convert the datetime delta to ints.
df['offset'] = df['offset_timedelta'].dt.days.astype('int16')
# Create bins for each offset:
# Each row will be grouped into an interval. based on the list [1,2,5,10,1000]
# 1000 is just an upper bound to get "the rest"
df['bins'] = pd.cut(df['offset'], [1,2,5,10,1000], include_lowest=True)
# This groups on day1 and the bin, so that we can sum for each.
grouped = df.groupby(['Date_1','bins'])[['individual_count']].sum()
# The groupby gives and index of 'Date_1','bins'. This converts bins to columns instead of and index.
final = grouped.unstack()
编辑:重命名列以使它们更像原始问题。
关于python - 根据数据框生成3个不同的列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58796668/