我在这里先向您的帮助表示感谢。我有一个看起来像这样的熊猫数据框:

     index   source    timestamp    value
      1        car       1         ['98']
      2        bike      2         ['98', 100']
      3        car       3         ['65']
      4        bike      4         ['100', '120']
      5        plane     5         ['20' , '12', '30']


我需要的是将“值” Panda系列中的每个值转换为新列。所以输出将是这样的:

      index   source    timestamp   car  bike1  bike2  plane1  plane2  plane3
        1      car          1       98    Na     Na     Na       Na     Na
        2      bike         2       Na    98     100    Na       Na     Na
        3      car          3       65    Na     Na     Na       Na     Na
        4      bike         4       Na    100    120    Na       Na     Na
        5      plane        5       Na    Na     Na     20       12     30


对于汽车,数组的大小将始终为1,对于自行车2以及对于飞机3,则将始终为1。这将转化为我在新数据框中需要的新列数。实现此目标的最佳方法是什么?

最佳答案

首先将值转换为列表:

import ast
df['value'] = df['value'].apply(ast.literal_eval)


然后为每一行创建字典:

L = [{f'{i}{x+1}':y for x, y in enumerate(j)} for i, j in zip(df['source'], df['value'])]
print (L)
[{'car1': '98'},
 {'bike1': '98', 'bike2': '100'},
 {'car1': '65'},
 {'bike1': '100', 'bike2': '120'},
 {'plane1': '20', 'plane2': '12', 'plane3': '30'}]


创建DataFrame并加入原始df:

df = df.join(pd.DataFrame(L, index=df.index))
print (df)
   index source  timestamp         value bike1 bike2 car1 plane1 plane2 plane3
0      1    car          1          [98]   NaN   NaN   98    NaN    NaN    NaN
1      2   bike          2     [98, 100]    98   100  NaN    NaN    NaN    NaN
2      3    car          3          [65]   NaN   NaN   65    NaN    NaN    NaN
3      4   bike          4    [100, 120]   100   120  NaN    NaN    NaN    NaN
4      5  plane          5  [20, 12, 30]   NaN   NaN  NaN     20     12     30

关于python - Python将数组的值拆分为不同的列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/52759380/

10-12 07:02