我有如下数据集:
name status number message
matt active 12345 [job: , money: none, wife: none]
james active 23456 [group: band, wife: yes, money: 10000]
adam inactive 34567 [job: none, money: none, wife: , kids: one, group: jail]
如何提取键值对,并将其转换为一直扩展的数据框?
预期产量:
name status number job money wife group kids
matt active 12345 none none none none none
james active 23456 none 10000 none band none
adam inactive 34567 none none none none one
该消息包含多种不同的 key 类型。
任何帮助将不胜感激。
最佳答案
这不简单。
需要通过 list
将值转换为dict
的replace
(\s+
是一个或多个空格),然后使用 ast
。
然后可以将DataFrame
构造函数与 concat
的 pop
放置列一起使用ojit_a:
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'],
['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife
0 NaN none NaN none NaN none
1 NaN NaN band NaN 10000 yes
2 one NaN jail none none none
df = pd.concat([df, df1], axis=1)
print (df)
name status number kids money group job money wife
0 matt active 12345 NaN none NaN none NaN none
1 james active 23456 NaN NaN band NaN 10000 yes
2 adam inactive 34567 one NaN jail none none none
编辑:
另一个
df
解决方案:import yaml
df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife
0 NaN None NaN none none
1 band NaN NaN 10000 True
2 jail none one none None
df = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife
0 matt active 12345 NaN None NaN none none
1 james active 23456 band NaN NaN 10000 True
2 adam inactive 34567 jail none one none None
关于python - 字典的 Pandas 列表以单独的列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43032182/