问题描述
我有一个如下所示的数据框
I have a data frame as shown below
B_ID No_Show Session slot_num Patient_count
1 0.2 S1 1 1
2 0.3 S1 2 1
3 0.8 S1 3 1
4 0.3 S1 3 2
5 0.6 S1 4 1
6 0.8 S1 5 1
7 0.9 S1 5 2
8 0.4 S1 5 3
9 0.6 S1 5 4
12 0.9 S2 1 1
13 0.5 S2 1 2
14 0.3 S2 2 1
15 0.7 S2 3 1
20 0.7 S2 4 1
16 0.6 S2 5 1
17 0.8 S2 5 2
19 0.3 S2 5 3
哪里
No_Show = 没有出现的概率
No_Show = Probability of no show
假设
阈值概率 = 0.2每个时段的持续时间 = 30(分钟)
threshold probability = 0.2Duration for each slot = 30 (minutes)
从上面我想计算下面的数据框
From the above I would like calculate below data frame
第一步
根据 Session、slot_number 和 Patient_count 对数据帧进行排序
sort the dataframe based on Session, slot_number and Patient_count
df = df.sort_values(['Session', 'slot_num', 'Patient_count'], ascending=False)
步骤 2 使用以下条件计算截止
step 2 Calculate the cut off by using below conditions
如果患者计数 = 1如果患者计数 = 1,则将 No_show 除以阈值概率
if patient_count = 1Divide No_show by threshold probability if patient_count = 1
Example for B_ID = 3, Patient_count = 1, cut_off = 0.8/0.2 = 4
否则如果患者计数 = 2将先前的 1 No_Show 与当前的 No_show 相乘并除以阈值)
else if patient_count = 2multiply previously 1 No_Show with current No_show and divide with threshold)
Example for B_ID = 4, Patient_count = 2, cut_off = (0.3*0.8)/0.2 = 1.2
否则如果患者计数 = 3将先前的 2 No_Show 与当前的 No_show 相乘并除以阈值
else if patient_count = 3multiply previously 2 No_Show with current No_show and divide with threshold
Example for B_ID = 8, Patient_count = 3, cut_off = (0.4*0.9*0.8)/0.2 = 1.44
等等
预期输出:
B_ID No_Show Session slot_num Patient_count Cut_off
1 0.2 S1 1 1 1
2 0.3 S1 2 1 1.5
3 0.8 S1 3 1 4
4 0.3 S1 3 2 1.2
5 0.6 S1 4 1 3
6 0.8 S1 5 1 4
7 0.9 S1 5 2 3.6
8 0.4 S1 5 3 1.44
9 0.6 S1 5 4 0.864
12 0.9 S2 1 1 4.5
13 0.5 S2 1 2 2.25
14 0.3 S2 2 1 1.5
15 0.7 S2 3 1 3.5
20 0.7 S2 4 1 3.5
16 0.6 S2 5 1 3
17 0.8 S2 5 2 2.4
19 0.3 S2 5 3 0.72
推荐答案
使用 GroupBy.cumprod
并除以 probability
和 Series.div
:
Use GroupBy.cumprod
and divide by probability
by Series.div
:
probability = 0.2
df['new'] = df.groupby(['Session','slot_num'])['No_Show'].cumprod().div(probability)
print (df)
B_ID No_Show Session slot_num Patient_count new
0 1 0.2 S1 1 1 1.000
1 2 0.3 S1 2 1 1.500
2 3 0.8 S1 3 1 4.000
3 4 0.3 S1 3 2 1.200
4 5 0.6 S1 4 1 3.000
5 6 0.8 S1 5 1 4.000
6 7 0.9 S1 5 2 3.600
7 8 0.4 S1 5 3 1.440
8 9 0.6 S1 5 4 0.864
9 12 0.9 S2 1 1 4.500
10 13 0.5 S2 1 2 2.250
11 14 0.3 S2 2 1 1.500
12 15 0.7 S2 3 1 3.500
13 20 0.7 S2 4 1 3.500
14 16 0.6 S2 5 1 3.000
15 17 0.8 S2 5 2 2.400
16 19 0.3 S2 5 3 0.720
这篇关于基于pandas中的groupby或循环条件划分列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!