我有这样一个数据框:

name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']

test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})

    actfig       car                name    pet
0   superman     lamborghini        fred    cat
1   batman       ferrari            fred    dog
2   flash        bugatti            fred    bird
3   greenlantern ferrari            james   cat
4   flash        corvette           james   dog
5   batman       bugatti            rick    dog
6   joker        bmw                rick    fish
7   superman     bmw                jeff    marmet


如果我的术语不正确,请原谅我,但我想对数据进行透视,以便在每个名称的['actionfigures','car','pet']列中获取每个值的计数。

    batman  flash   greenlantern    joker   superman    bmw bugatti corvette    ferrari lamborghini bird    cat dog fish    marmet
name
fred    1   1   0   0   1   0   1   0   1   1   1   1   1   0   0
james   0   1   1   0   0   0   0   1   1   0   0   1   1   0   0
jeff    0   0   0   0   1   1   0   0   0   0   0   0   0   0   1
rick    1   0   0   1   0   1   1   0   0   0   0   0   1   1   0


我本以为test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])可以做到,但它给了我一些奇怪的多层列。

我想也许我可以在每一列上都用get_dummies组合,然后用groupby的名称和总和组合,但是觉得熊猫概率有更好的方法。

怎么做?

最佳答案

meltpivot

test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]:
value  batman  bird  bmw  bugatti  cat  corvette  dog  ferrari  fish  flash  \
name
fred      1.0   1.0  0.0      1.0  1.0       0.0  1.0      1.0   0.0    1.0
james     0.0   0.0  0.0      0.0  1.0       1.0  1.0      1.0   0.0    1.0
jeff      0.0   0.0  1.0      0.0  0.0       0.0  0.0      0.0   0.0    0.0
rick      1.0   0.0  1.0      1.0  0.0       0.0  1.0      0.0   1.0    0.0
value  greenlantern  joker  lamborghini  marmet  superman
name
fred            0.0    0.0          1.0     0.0       1.0
james           1.0    0.0          0.0     0.0       0.0
jeff            0.0    0.0          0.0     1.0       1.0
rick            0.0    1.0          0.0     0.0       0.0


get_dummies

pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]:
       actfig_batman  actfig_flash  actfig_greenlantern  actfig_joker  \
name
fred               1             1                    0             0
james              0             1                    1             0
jeff               0             0                    0             0
rick               1             0                    0             1
       actfig_superman  car_bmw  car_bugatti  car_corvette  car_ferrari  \
name
fred                 1        0            1             0            1
james                0        0            0             1            1
jeff                 1        1            0             0            0
rick                 0        1            1             0            0
       car_lamborghini  pet_bird  pet_cat  pet_dog  pet_fish  pet_marmet
name
fred                 1         1        1        1         0           0
james                0         0        1        1         0           0
jeff                 0         0        0        0         0           1
rick                 0         0        0        1         1           0


编辑:根据PiR

pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1])

关于python - Pandas -枢轴多个分类列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46733674/

10-12 23:25