我有一个 df
,它在重复的时间段包含许多 Places
。这些 Places
以随机方式开始和结束。对于每个时间段,我想将每个唯一的位置分配给 Group
。这样做的核心规则是:
1) 每个 Group
在任何时候最多只能容纳 3 个唯一的 Places
2) 唯一的 Places
应该均匀分布在每个 Group
上
我选取了 df
的一小部分。有 7 个唯一值(但在任一时间出现的值不超过 5 个)和 2 个 Groups
可供选择。但实际上,df
总共可以包含多达 50 个唯一值,这些值将分布在最多 6 个组中的结束和结束以及不同的时间段。
为了了解当前有多少 Places
发生,我包含了一个 Total
,它基于 Place
是否再次出现。df
包含每个 Groups
处每个唯一 Place
的所有可用 Period
。地方 Golf
和 Club
将完成,但我们假设所有其他地方都将继续,因为它们稍后出现在 df
中。
df = pd.DataFrame({
'Period' : [1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6],
'Place' : ['CLUB','CLUB','CLUB','HOME','HOME','AWAY','AWAY','WORK','WORK','AWAY','AWAY','GOLF','GOLF','CLUB','CLUB','POOL','POOL','HOME','HOME','WORK','WORK','AWAY','AWAY','POOL','POOL','TENNIS','TENNIS'],
'Total' : [1,1,1,2,2,3,3,4,4,4,4,5,5,4,4,4,4,4,4,4,4,4,4,4,4,5,5],
'Available Group' : ['1','2','1','2','1','2','1','2','1','1','2','1','2','2','1','2','1','2','1','2','1','1','2','1','2','2','1'],
})
给我带来麻烦的主要问题是
Places
动态出现/存在。在那种情况下,它们以随机方式结束,新的开始。所以分配和分配 当前 唯一的 Places
需要考虑这个概念试图:
def AssignPlace(df):
uniquePlaces = df['Place'].unique()
G3 = dict(zip(uniquePlaces, np.arange(len(uniquePlaces)) // 3 + 1))
df['Assigned Group'] = df['Place'].map(G3)
return df
df = df.groupby('Available Group', sort=False).apply(AssignPlace)
df = df.drop_duplicates(subset = ['Period','Place'])
出去:
Period Place Total Available Group Assigned Group
0 1 CLUB 1 1 1
1 2 CLUB 1 2 1
3 2 HOME 2 2 1
5 2 AWAY 3 2 1
7 3 WORK 4 2 2
9 3 AWAY 4 1 1
11 3 GOLF 5 1 2 #GOLF FINISHES SO 4 OCCURING FROM NEXT ROW
13 4 CLUB 4 2 1 #CLUB FINISHES BUT POOL STARTS SO STILL 4 OCCURING FROM NEXT ROW
15 4 POOL 4 2 2
17 4 HOME 4 2 1
19 5 WORK 4 2 2
21 5 AWAY 4 1 1
23 5 POOL 4 1 2
25 6 TENNIS 5 2 3 #Signifies issue
最后一行显示问题的开始。分配的组将此地点正确地测量为第 7 个唯一值,但它不考虑当前唯一值。随着
Club
和 Golf
完成,它们只有 5 个当前 unqiue 值和 2 个可用组。但它返回 Group 3
。因此,将继续计算每个新的唯一值,而不是考虑当前出现的唯一值。预期输出,
TENNIS
分配组现在是 1
,而不是 3
: Period Place Total Available Group Assigned Group
0 1 CLUB 1 1 1
1 2 CLUB 1 2 1
3 2 HOME 2 2 1
5 2 AWAY 3 2 1
7 3 WORK 4 2 2
9 3 AWAY 4 1 1
11 3 GOLF 5 1 2
13 4 CLUB 4 2 1
15 4 POOL 4 2 2
17 4 HOME 4 2 1
19 5 WORK 4 2 2
21 5 AWAY 4 1 1
23 5 POOL 4 1 2
25 6 TENNIS 5 2 1
最佳答案
这是我的尝试。
解释在代码注释上,如果不够,请在此处给我留言
注意 :我在底部添加了 5 个虚拟行来模拟这些位置将出现在 df 的后面。所以请忽略 period=0 的行
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Period' : [1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6,0,0,0,0,0],
'Place' : ['CLUB','CLUB','CLUB','HOME','HOME','AWAY','AWAY','WORK','WORK','AWAY','AWAY','GOLF','GOLF','CLUB','CLUB','POOL','POOL','HOME','HOME','WORK','WORK','AWAY','AWAY','POOL','POOL','TENNIS','TENNIS', "AWAY","HOME","POOL","WORK", "TENNIS"],
# 'Total' : [1,1,1,2,2,3,3,4,4,4,4,5,5,4,4,4,4,4,4,4,4,4,4,4,4,5,5,0,0,0,0,0],
# 'Available Group' : ['1','2','1','2','1','2','1','2','1','1','2','1','2','2','1','2','1','2','1','2','1','1','2','1','2','2','1',0,0,0,0,0],
})
# df to store all unique places
uniquePlaces = pd.DataFrame(df["Place"].unique(), columns=["Place"])
# Start stores index of df where the place appears 1st
uniquePlaces["Start"] = -1
# End stores index of df where the place appears last
uniquePlaces["End"] = -1
## adds new column "Place Label" which is label encoded value for a place
## "Place Label" may not be necessary but it may improve performance when looking up and merging
## this function also updates Start and End of current label in group
def assign_place_label(group):
label=uniquePlaces[uniquePlaces["Place"]==group.name].index[0]
group["Place Label"] = label
uniquePlaces.loc[label, "Start"] = group.index.min()
uniquePlaces.loc[label, "End"] = group.index.max()
return group
## based on Start and End of each place assign index to each place
## when a freed the index is reused to new place appearing after that
def get_dynamic_group(up):
up["Index"] = 0
up["Freed"] = False
max_ind=0
free_indx = []
for i in range(len(up)):
ind_freed = up.index[(up["End"]<up.iloc[i]["Start"]) & (~up["Freed"])]
free = list(up.loc[ind_freed, "Index"])
free_indx += free
up.loc[ind_freed, "Freed"] = True
if len(free_indx)>0:
m = min(free_indx)
up.loc[i, "Index"] = m
free_indx.remove(m)
else:
up.loc[i, "Index"] = max_ind
max_ind+=1
up["Group"] = up["Index"]//3+1
return up
df2 = df.groupby("Place").apply(assign_place_label)
uniquePlaces = get_dynamic_group(uniquePlaces)
display(uniquePlaces)
df3 = df2[df2.Period!=0].drop_duplicates(subset = ['Period','Place'])
result = df3.merge(uniquePlaces[["Group"]], how="left", left_on="Place Label",
right_index=True, sort=False)
display(result)
输出
Period Place Place Label Group
0 1 CLUB 0 1
1 2 CLUB 0 1
3 2 HOME 1 1
5 2 AWAY 2 1
7 3 WORK 3 2
9 3 AWAY 2 1
11 3 GOLF 4 2
13 4 CLUB 0 1
15 4 POOL 5 2
17 4 HOME 1 1
19 5 WORK 3 2
21 5 AWAY 2 1
23 5 POOL 5 2
25 6 TENNIS 6 1
关于python - 从不同的选项分配值 - Pandas ,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58021749/