我有如下数据集:

name    status    number   message
matt    active    12345    [job:  , money: none, wife: none]
james   active    23456    [group: band, wife: yes, money: 10000]
adam    inactive  34567    [job: none, money: none, wife:  , kids: one, group: jail]

如何提取键值对,并将其转换为一直扩展的数据框?

预期产量:
name    status   number    job    money    wife    group   kids
matt    active   12345     none   none     none    none    none
james   active   23456     none   10000    none    band    none
adam    inactive 34567     none   none     none    none    one

该消息包含多种不同的 key 类型。

任何帮助将不胜感激。

最佳答案

这不简单。

需要通过 list 将值转换为dictreplace(\s+是一个或多个空格),然后使用 ast

然后可以将DataFrame构造函数与 concat pop 放置列一起使用ojit_a:

import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'],
                                ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
   kids  money group   job  money  wife
0   NaN   none   NaN  none    NaN  none
1   NaN    NaN  band   NaN  10000   yes
2   one    NaN  jail  none   none  none

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number  kids  money group   job  money  wife
0   matt    active   12345   NaN   none   NaN  none    NaN  none
1  james    active   23456   NaN    NaN  band   NaN  10000   yes
2   adam  inactive   34567   one    NaN  jail  none   none  none

编辑:

另一个df解决方案:
import yaml

df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
  group   job kids  money  wife
0   NaN  None  NaN   none  none
1  band   NaN  NaN  10000  True
2  jail  none  one   none  None

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number group   job kids  money  wife
0   matt    active   12345   NaN  None  NaN   none  none
1  james    active   23456  band   NaN  NaN  10000  True
2   adam  inactive   34567  jail  none  one   none  None

关于python - 字典的 Pandas 列表以单独的列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43032182/

10-10 19:10