本文介绍了在带有数组的嵌套Json上使用Pandas json_normalize的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
问题是使用json对象的嵌套数组对json进行规范化.我已经看过类似的问题,并试图使用他们的解决方案无济于事.
The problem is normalizing a json with nested array of json objects. I have looked at similar questions and tried to use their solution to no avail.
这就是我的json对象的样子.
This is what my json object looks like.
{
"results": [
{
"_id": "25",
"Product": {
"Description": "3 YEAR",
"TypeLevel1": "INTEREST",
"TypeLevel2": "LONG"
},
"Settlement": {},
"Xref": {
"SCSP": "96"
},
"ProductSMCP": [
{
"SMCP": "01"
}
]
},
{
"_id": "26",
"Product": {
"Description": "10 YEAR",
"TypeLevel1": "INTEREST",
"Currency": "USD",
"Operational": true,
"TypeLevel2": "LONG"
},
"Settlement": {},
"Xref": {
"BBT": "CITITYM9",
"TCK": "ZN"
},
"ProductSMCP": [
{
"SMCP": "01"
},
{
"SMCP2": "02"
}
]
}
]
}
这是我用于规范json对象的代码.
Here is my code for normalizing the json object.
data = json.load(j)
data = data['results']
print pd.io.json.json_normalize(data)
我想要的结果应该是这样
id Description TypeLevel1 TypeLevel2 Currency \
25 3 YEAR US INTEREST LONG NAN
26 10 YEAR US INTEREST NAN USD
BBT TCT SMCP SMCP2 SCSP
NAN NAN 521 NAN 01
M9 ZN 01 02 NAN
但是,我得到的结果是这样的:
However, the result I get is this:
Product.Currency Product.Description Product.Operational Product.TypeLevel1 \
0 NaN 3 YEAR NaN INTEREST
1 USD 10 YEAR True INTEREST
Product.TypeLevel2 ProductSMCP Xref.BBT Xref.SCSP \
0 LONG [{'SMCP': '01'}] NaN 96
1 LONG [{'SMCP': '01'}, {'SMCP2': '02'}] CITITYM9 NaN
Xref.TCK _id
0 NaN 25
1 ZN 26
如您所见,问题出在 ProductSCMP ,它没有完全展平阵列.
As you can see, the issue is at ProductSCMP, it is not completely flattening the array.
推荐答案
一旦我们超过了第一次规范化,我将应用lambda
来完成工作.
Once we get past first normalization, I'd apply a lambda
to finish the job.
from cytoolz.dicttoolz import merge
pd.io.json.json_normalize(data).pipe(
lambda x: x.drop('ProductSMCP', 1).join(
x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
)
)
Product.Currency Product.Description Product.Operational Product.TypeLevel1 Product.TypeLevel2 Xref.BBT Xref.SCSP Xref.TCK _id SMCP SMCP2
0 NaN 3 YEAR NaN INTEREST LONG NaN 96 NaN 25 01 NaN
1 USD 10 YEAR True INTEREST LONG CITITYM9 NaN ZN 26 01 02
修剪列名称
pd.io.json.json_normalize(data).pipe(
lambda x: x.drop('ProductSMCP', 1).join(
x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
)
).rename(columns=lambda x: re.sub('(Product|Xref)\.', '', x))
Currency Description Operational TypeLevel1 TypeLevel2 BBT SCSP TCK _id SMCP SMCP2
0 NaN 3 YEAR NaN INTEREST LONG NaN 96 NaN 25 01 NaN
1 USD 10 YEAR True INTEREST LONG CITITYM9 NaN ZN 26 01 02
这篇关于在带有数组的嵌套Json上使用Pandas json_normalize的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!