本文介绍了根据用户所属的群集将值分配给用户的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,一个包含喜欢歌曲的客户,另一个数据框包含用户及其集群.

I have two dataframes, one with the customers who prefer songs, and my other dataframe consists of users and their cluster.

数据1:

user    song
A   11
A   22
B   99
B   11
C   11
D   44
C   66
E   66
D   33
E   55
F   11
F   77

数据2:

user    cluster
A   1
B   2
C   3
D   1
E   2
F   3

使用上述数据集,我能够实现该集群的用户收听的所有歌曲.

Using above data sets, I was able to achieve what all songs are listened by users of that cluster.

cluster songs
    1   [11, 22, 33, 44]
    2   [11, 99, 66, 55]
    3   [11,66,88,77]

我需要将特定群集的歌曲分配给尚未收听的特定用户.在我的预期输出中,A属于群集1,并且他尚未收听歌曲33和44 ..因此,我的输出应如下所示.对于属于集群2的B,B相同,B未收听66和55首歌曲,B的输出如下所示.

I need to assign the song of a particular cluster to that particular user who has not listened to it yet.In my expected output A belongs to cluster 1, and he has not yet listened to song 33 and 44..so my output should be like below. Same for B, which belongs to cluster 2, B has not listen to 66 and 55 songs, output for B looks like below.

期望的输出:

  user  song
    A   [33, 44]
    B   [66,55]
    C   [77]
    D   [11,22]
    E   [11,99]
    F   [66]

推荐答案

不容易:

#add column and remove duplicates
df = pd.merge(df1, df2, on='user', how='left').drop_duplicates(['user','song'])

def f(x):
    #for each group reshape
    x = x.pivot('user','song','cluster')
    #get all columns values if NaNs in data
    x = x.apply(lambda x: x.index[x.isnull()].tolist(),1)
    return x

df1 = df.groupby(['cluster']).apply(f).reset_index(level=0, drop=True).sort_index()
user
A    [33, 44]
B    [55, 66]
C        [77]
D    [11, 22]
E    [11, 99]
F        [66]
dtype: object

类似的解决方案:

df = pd.merge(df1, df2, on='user', how='left').drop_duplicates(['user','song'])
df1 = (df.groupby(['cluster']).apply(lambda x: x.pivot('user','song','cluster').isnull())
        .fillna(False)
        .reset_index(level=0, drop=True)
        .sort_index())

#replace each True by value of column
s = np.where(df1, ['{}, '.format(x) for x in df1.columns.astype(str)], '')
#remove empty values
s1 = pd.Series([''.join(x).strip(', ') for x in s], index=df1.index)
print (s1)
user
A    33, 44
B    55, 66
C        77
D    11, 22
E    11, 99
F        66
dtype: object

这篇关于根据用户所属的群集将值分配给用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 21:04