python - 合并多个表并用逗号分割加入同一列

我有大约15个具有相同数量的唯一ID的csv文件。对于每个文件，col1包含不同的文本。如何将它们连接在一起以创建一个包含这15个文件中所有信息的新表？我尝试使用pd.merge，创建一个新的col1逗号分隔这些文本，然后删除重复的col1。将有一些名为col1_x，col1_y，col1_y等的列。还有其他更好的方法来实现这一点吗？

我的输入是

df1:
ID   col1    location    gender
1    Airplane   NY         F
2    Bus        CA         M
3    NaN        FL         M
4    Bus        WA         F

df2:
ID   col1    location    gender
1    Apple      NY         F
2    Peach      CA         M
3    Melon      FL         M
4    Banana     WA         F

df3:
ID   col1    location    gender
1    NaN        NY         F
2    Football   CA         M
3    Boxing     FL         M
4    Running    WA         F

预期的输出是

ID   col1                location    gender
1    Airplane,Apple         NY         F
2    Bus,Peach,Football     CA         M
3    Melon,Boxing           FL         M
4    Bus,Banana,Running     WA         F

最佳答案

您可以使用concat + groupby：

merged = pd.concat([df1, df2, df3], sort=False)
result = merged.dropna().groupby(['location', 'gender'], as_index=False).agg({'col1' : ','.join}).reset_index(drop=True)
print(result)

输出量

  location gender                col1
0       CA      M  Bus,Peach,Football
1       FL      M        Melon,Boxing
2       NY      F      Airplane,Apple
3       WA      F  Bus,Banana,Running