我有此数据框df

AA_0    AA_1     AA_2     AA_3
store   cake     mass     visit
store   mass     visit
mass    store
store   cake     mass     visit


我想计算每个序列AA_0-AA_3df中出现并表示结果的次数,如下所示:

result =

    count   data
    2       store/cake/mass/visit
    1       store/mass/visit
    1       mass/store


我该怎么做?

最佳答案

您可以使用:

df['data'] = df.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
    AA_0   AA_1   AA_2   AA_3                   data
0  store   cake   mass  visit  store/cake/mass/visit
1  store   mass  visit    NaN       store/mass/visit
2   mass  store    NaN    NaN             mass/store
3  store   cake   mass  visit  store/cake/mass/visit

result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
                   count  data
0  store/cake/mass/visit     2
1       store/mass/visit     1
2             mass/store     1


如果缺少的数据为空格:

df['data'] = df.apply(lambda x: '/'.join(x), axis=1).str.strip('/ ')
print (df)
    AA_0   AA_1   AA_2   AA_3                   data
0  store   cake   mass  visit  store/cake/mass/visit
1  store   mass  visit              store/mass/visit
2   mass  store                           mass/store
3  store   cake   mass  visit  store/cake/mass/visit

result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
                   count  data
0  store/cake/mass/visit     2
1       store/mass/visit     1
2             mass/store     1

07-25 21:15
查看更多