本文介绍了字典的 pandas 列表以单独的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的数据集如下:
name status number message
matt active 12345 [job: , money: none, wife: none]
james active 23456 [group: band, wife: yes, money: 10000]
adam inactive 34567 [job: none, money: none, wife: , kids: one, group: jail]
如何提取键值对,并将其转换为一直扩展的数据框?
How can I extract the key value pairs, and turn them into a dataframe expanded all the way out?
预期输出:
name status number job money wife group kids
matt active 12345 none none none none none
james active 23456 none 10000 none band none
adam inactive 34567 none none none none one
该消息包含多种不同的密钥类型.
The message contains multiple different key types.
任何帮助将不胜感激.
Any help would be greatly appreciated.
推荐答案
这并不容易.
需要通过dict的list
noreferrer> replace
(\s+
是一个或多个空格),然后使用 ast
.
Need convert values to list
of dict
by replace
(\s+
is one or more whitespaces) and then use ast
.
然后可以将DataFrame
构造函数与 , pop
从df
删除列:
Then is possible use DataFrame
constructor with concat
, pop
drop column from df
:
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'],
['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife
0 NaN none NaN none NaN none
1 NaN NaN band NaN 10000 yes
2 one NaN jail none none none
df = pd.concat([df, df1], axis=1)
print (df)
name status number kids money group job money wife
0 matt active 12345 NaN none NaN none NaN none
1 james active 23456 NaN NaN band NaN 10000 yes
2 adam inactive 34567 one NaN jail none none none
使用yaml
的另一种解决方案:
Another solution with yaml
:
import yaml
df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife
0 NaN None NaN none none
1 band NaN NaN 10000 True
2 jail none one none None
df = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife
0 matt active 12345 NaN None NaN none none
1 james active 23456 band NaN NaN 10000 True
2 adam inactive 34567 jail none one none None
这篇关于字典的 pandas 列表以单独的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!