目前,我有这种数据:

Item    Properties
A   C001
A   C002
A   C003
B   C001
B   C003
C   C001


我想将这些项目归类为

A   C001, C002, C003
B   C001, C003
C   C001


然后,我想根据属性相似性来匹配那些项目:

A   B   2
A   C   1
B   C   1


如何使用熊猫修改此数据框?我确实使用了groupby方法,但是它显示的是属性数量而不是属性名称数组。

最佳答案

import pandas as pd

selfjoin = pd.merge(df, df, on = 'Property')
similarity = selfjoin.groupby(('Item_x', 'Item_y'), as_index=False).size()

10-04 20:44