问题描述
我有两个数据框,一个包含喜欢歌曲的客户,另一个数据框包含用户及其集群.
I have two dataframes, one with the customers who prefer songs, and my other dataframe consists of users and their cluster.
数据1:
user song
A 11
A 22
B 99
B 11
C 11
D 44
C 66
E 66
D 33
E 55
F 11
F 77
数据2:
user cluster
A 1
B 2
C 3
D 1
E 2
F 3
使用上述数据集,我能够实现该集群的用户收听的所有歌曲.
Using above data sets, I was able to achieve what all songs are listened by users of that cluster.
cluster songs
1 [11, 22, 33, 44]
2 [11, 99, 66, 55]
3 [11,66,88,77]
我需要将特定群集的歌曲分配给尚未收听的特定用户.在我的预期输出中,A属于群集1,并且他尚未收听歌曲33和44 ..因此,我的输出应如下所示.对于属于集群2的B,B相同,B未收听66和55首歌曲,B的输出如下所示.
I need to assign the song of a particular cluster to that particular user who has not listened to it yet.In my expected output A belongs to cluster 1, and he has not yet listened to song 33 and 44..so my output should be like below. Same for B, which belongs to cluster 2, B has not listen to 66 and 55 songs, output for B looks like below.
期望的输出:
user song
A [33, 44]
B [66,55]
C [77]
D [11,22]
E [11,99]
F [66]
推荐答案
不容易:
#add column and remove duplicates
df = pd.merge(df1, df2, on='user', how='left').drop_duplicates(['user','song'])
def f(x):
#for each group reshape
x = x.pivot('user','song','cluster')
#get all columns values if NaNs in data
x = x.apply(lambda x: x.index[x.isnull()].tolist(),1)
return x
df1 = df.groupby(['cluster']).apply(f).reset_index(level=0, drop=True).sort_index()
user
A [33, 44]
B [55, 66]
C [77]
D [11, 22]
E [11, 99]
F [66]
dtype: object
类似的解决方案:
df = pd.merge(df1, df2, on='user', how='left').drop_duplicates(['user','song'])
df1 = (df.groupby(['cluster']).apply(lambda x: x.pivot('user','song','cluster').isnull())
.fillna(False)
.reset_index(level=0, drop=True)
.sort_index())
#replace each True by value of column
s = np.where(df1, ['{}, '.format(x) for x in df1.columns.astype(str)], '')
#remove empty values
s1 = pd.Series([''.join(x).strip(', ') for x in s], index=df1.index)
print (s1)
user
A 33, 44
B 55, 66
C 77
D 11, 22
E 11, 99
F 66
dtype: object
这篇关于根据用户所属的群集将值分配给用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!