我有一个非常大的数据框,我想对“ NAME”列进行分组,并连接ID,ID2,ID3和ID4列中的唯一字符串值以创建一个列。
我试过了
df.groupby('NAME').apply(lambda x: x['ID'] + x['ID2'] + x['ID3'] + x['ID4'])
我已经尝试了多个带有lambda函数的
df.groupby.agg
,但是需要找出一种将pd.unique
与字符串值结合使用的方法。NAME Type ID ID2 ID3 ID4 MEMBERSHIP
Capg Active 778 535 667 898 Global
Capg Active 778 835 100 444 Blue
Capg Active 778 535 667 898 Black
Doy Active 246 8989 667 777 Silver
Doy Active 266 8989 900 777 Silver
Doy Active 266 8989 900 777 Silver
Art Active 778 135 888 007 White
Art Active 778 135 888 007 Silver
Art Active 778 135 888 008 White
Art Active 778 135 888 007 White
所需输出:
NAME Type ID ID2 ID3 ID4 MEMBERSHIP MERGED_IDS
Capg Active 778 535 667 898 Global 778, 535, 667, 898, 835, 100
Capg Active 778 835 100 444 Blue 778, 535, 667, 898, 835, 100
Capg Active 778 535 667 898 Black 778, 535, 667, 898, 835, 100
Doy Active 246 8989 667 777 Silver 246, 8989, 667, 777, 266, 900
Doy Active 266 8989 900 777 Silver 246, 8989, 667, 777, 266, 900
Doy Active 266 8989 900 777 Silver 246, 8989, 667, 777, 266, 900
Art Active 778 135 888 007 White 778, 135, 888, 007, 008
Art Active 778 135 888 007 Silver 778, 135, 888, 007, 008
Art Active 778 135 888 008 White 778, 135, 888, 007, 008
Art Active 778 135 888 007 White 778, 135, 888, 007, 008
最佳答案
您可以将您的ID转换为集合,然后获取它们的并集:
import io
import pandas as pd
data = """NAME Type ID ID2 ID3 ID4 MEMBERSHIP
Capg Active 778 535 667 898 Global
Capg Active 778 835 100 444 Blue
Capg Active 778 535 667 898 Black
Doy Active 246 8989 667 777 Silver
Doy Active 266 8989 900 777 Silver
Doy Active 266 8989 900 777 Silver
Art Active 778 135 888 007 White
Art Active 778 135 888 007 Silver
Art Active 778 135 888 008 White
Art Active 778 135 888 007 White"""
df = pd.read_csv(io.StringIO(data), sep=' ', skipinitialspace=True, dtype=str)
def group_IDs(x):
return set(x['ID']) | set(x['ID2']) | set(x['ID3']) | set(x['ID4'])
grouped = df.groupby("NAME").apply(group_IDs)
grouped.name = "MERGED_IDS"
df.merge(grouped, left_on='NAME', right_index=True)
导致:
关于python - 按列分组并连接多个列的唯一字符串值以创建一个列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57522394/