我有一个数据帧,如下面的df1我想从包含-的项目中删除重复的项目。例如,第1行和第3行将分别删除1A1A2B,就像df2一样。
如何删除重复项?
数据帧:

df1 = DataFrame({'Condition': ['1A', '1A, 1A-1A', '1A, 2B', '1A, 2B, 1A-2B', '3C, 1A-2B']})

df1
    Condition
0   1A
1   1A, 1A-1A
2   1A, 2B
3   1A, 2B, 1A-2B
4   3C, 1A-2B

目标输出:
df2 = DataFrame({'Condition': ['1A', '1A-1A', '1A, 2B', '1A-2B', '3C, 1A-2B']})

df2
    Condition
0   1A
1   1A-1A
2   1A, 2B
3   1A-2B
4   3C, 1A-2B

最佳答案

您可以创建具有-值的集合,并测试拆分后的值是否不在集合中,最后通过,连接:

L = []
for x in df1['Condition']:
    a = x.split(', ')
    s = set([z for y in a if '-' in y for z in y.split('-')])
    L.append(', '.join([z for z in a if z not in s]))

df1['new'] = L
print (df1)
       Condition        new
0             1A         1A
1      1A, 1A-1A      1A-1A
2         1A, 2B     1A, 2B
3  1A, 2B, 1A-2B      1A-2B
4      3C, 1A-2B  3C, 1A-2B

07-28 08:56