我有一个非常大的数据框,我想对“ NAME”列进行分组,并连接ID,ID2,ID3和ID4列中的唯一字符串值以创建一个列。

我试过了

df.groupby('NAME').apply(lambda x: x['ID'] + x['ID2'] + x['ID3'] + x['ID4'])


我已经尝试了多个带有lambda函数的df.groupby.agg,但是需要找出一种将pd.unique与字符串值结合使用的方法。

NAME   Type    ID     ID2     ID3     ID4    MEMBERSHIP
Capg   Active  778    535     667     898     Global
Capg   Active  778    835     100     444     Blue
Capg   Active  778    535     667     898     Black
Doy   Active  246     8989    667     777     Silver
Doy   Active  266     8989    900     777     Silver
Doy   Active  266     8989    900     777     Silver
Art   Active  778     135     888     007     White
Art   Active  778     135     888     007     Silver
Art   Active  778     135     888     008     White
Art   Active  778     135     888     007     White


所需输出:

NAME   Type    ID     ID2     ID3     ID4    MEMBERSHIP  MERGED_IDS
Capg   Active  778    535     667     898     Global     778, 535, 667, 898, 835, 100
Capg   Active  778    835     100     444     Blue       778, 535, 667, 898, 835, 100
Capg   Active  778    535     667     898     Black      778, 535, 667, 898, 835, 100
Doy   Active  246     8989    667     777     Silver     246, 8989, 667, 777, 266, 900
Doy   Active  266     8989    900     777     Silver     246, 8989, 667, 777, 266, 900
Doy   Active  266     8989    900     777     Silver     246, 8989, 667, 777, 266, 900
Art   Active  778     135     888     007     White      778, 135, 888, 007, 008
Art   Active  778     135     888     007     Silver     778, 135, 888, 007, 008
Art   Active  778     135     888     008     White      778, 135, 888, 007, 008
Art   Active  778     135     888     007     White      778, 135, 888, 007, 008

最佳答案

您可以将您的ID转换为集合,然后获取它们的并集:

import io
import pandas as pd

data = """NAME   Type    ID     ID2     ID3     ID4    MEMBERSHIP
Capg   Active  778    535     667     898     Global
Capg   Active  778    835     100     444     Blue
Capg   Active  778    535     667     898     Black
Doy   Active  246     8989    667     777     Silver
Doy   Active  266     8989    900     777     Silver
Doy   Active  266     8989    900     777     Silver
Art   Active  778     135     888     007     White
Art   Active  778     135     888     007     Silver
Art   Active  778     135     888     008     White
Art   Active  778     135     888     007     White"""

df = pd.read_csv(io.StringIO(data), sep=' ', skipinitialspace=True, dtype=str)

def group_IDs(x):
    return set(x['ID']) | set(x['ID2']) | set(x['ID3']) | set(x['ID4'])
grouped = df.groupby("NAME").apply(group_IDs)
grouped.name = "MERGED_IDS"
df.merge(grouped, left_on='NAME', right_index=True)


导致:
python - 按列分组并连接多个列的唯一字符串值以创建一个列-LMLPHP

关于python - 按列分组并连接多个列的唯一字符串值以创建一个列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57522394/

10-12 21:10