问题描述
我目前正在整理该词典文件,并且遇到了很多障碍.我正在尝试使用 json_normalize
来展平此数据.如果我使用单个实例进行测试,则可以正常工作,但是如果我要整理所有数据,它将返回一个错误,指出 key错误'0'
,我不确定如何解决此问题.
I am currently working on flattening this dictionary file and have reached a number of road blocks. I am trying to use json_normalize
to flatten this data. If I test with individual instances it works but if I want to flatten all the data it will return an error stating key error '0'
I'm not sure how to fix this.
数据示例-
data = {1:{
'Name': "Thrilling Tales of Dragon Slayers",
'IDs':{
"StoreID": ['123445452543'],
"BookID": ['543533254353'],
"SalesID": ['543267765345']},
2:{
'Name': "boring Tales of Dragon Slayers",
'IDs':{
"StoreID": ['111111', '1121111'],
"BookID": ['543533254353', '4324232342'],
"SalesID": ['543267765345', '4353543']}}
我的代码
d_flat = pd.io.json.json_normalize(data, meta=['Title', 'StoreID', 'BookID', 'SalesID'])
推荐答案
设置
您的数据结构不便.我想专注于:
Setup
Your data is structured inconveniently. I want to focus on:
- 将
"ID"
中的列表放入字典列表中,这会更加方便. - 摆脱父词典中无用的键.我们只关心价值.
- Getting the lists in
'IDs'
into a list of dictionaries, which would be far more convenient. - Getting rid of the useless keys in the parent dictionary. All we care about are the values.
您的数据
:
{1: {'Name': 'Thrilling Tales of Dragon Slayers',
'IDs': {'StoreID': ['123445452543'],
'BookID': ['543533254353'],
'SalesID': ['543267765345']}},
2: {'Name': 'boring Tales of Dragon Slayers',
'IDs': {'StoreID': ['111111', '1121111'],
'BookID': ['543533254353', '4324232342'],
'SalesID': ['543267765345', '4353543']}}}
我希望它看起来像什么
[{'Name': 'Thrilling Tales of Dragon Slayers',
'IDs': [{'StoreID': '123445452543',
'BookID': '543533254353',
'SalesID': '543267765345'}]},
{'Name': 'boring Tales of Dragon Slayers',
'IDs': [{'StoreID': '111111',
'BookID': '543533254353',
'SalesID': '543267765345'},
{'StoreID': '1121111',
'BookID': '4324232342',
'SalesID': '4353543'}]}]
重组数据
合理的方式
简单的循环,不要乱扔.这使我们得到了上面显示的内容
Restructure Data
Reasonable Way
Simple loop, don't mess around. This gets us what I showed above
new = []
for v in data.values():
temp = {**v} # This is intended to keep all the other data that might be there
ids = temp.pop('IDs') # I have to focus on this to create the records
temp['IDs'] = [dict(zip(ids, x)) for x in zip(*ids.values())]
new.append(temp)
可爱的单缸纸
new = [{**v, 'IDs': [dict(zip(v['IDs'], x)) for x in zip(*v['IDs'].values())]} for v in data.values()]
使用 pd.json_normalize
创建 DataFrame
在对 json_normalize
的调用中,我们需要指定记录的路径,即在'IDs'
键处找到的ID词典列表. json_normalize
将在数据框中为该列表中的每个项目创建一行.这将通过 record_path
参数完成,我们传递一个 tuple
来描述路径(如果是更深层的结构)或字符串(如果键位于顶层,对我们来说就是.)
Create DataFrame
with pd.json_normalize
In this call to json_normalize
we need to specify the path to the records, i.e. the list of id dictionaries found at the 'IDs'
key. json_normalize
will create one row in the dataframe for every item in that list. This will be done with the the record_path
parameter and we pass a tuple
that describes the path (if it were in a deeper structure) or a string (if the key is at the top layer, which for us, it is).
record_path = 'IDs'
然后我们要告诉 json_normalize
哪些键是记录的元数据.如果像我们一样有多个记录,那么将为每条记录重复元数据.
Then we want to tell json_normalize
what keys are metadata for the records. If there are more than one record, as we have, then the metadata will be repeated for each record.
meta = 'Name'
所以最终的解决方案是这样的:
So the final solution looks like this:
pd.json_normalize(new, record_path='IDs', meta='Name')
StoreID BookID SalesID Name
0 123445452543 543533254353 543267765345 Thrilling Tales of Dragon Slayers
1 111111 543533254353 543267765345 boring Tales of Dragon Slayers
2 1121111 4324232342 4353543 boring Tales of Dragon Slayers
但是
如果我们仍然要进行重组,那么最好还是将其传递给数据框构造函数.
However
If we are restructuring anyway, might as well make it so we can just pass it to the dataframe constructor.
pd.DataFrame([
{'Name': r['Name'], **dict(zip(r['IDs'], x))}
for r in data.values() for x in zip(*r['IDs'].values())
])
Name StoreID BookID SalesID
0 Thrilling Tales of Dragon Slayers 123445452543 543533254353 543267765345
1 boring Tales of Dragon Slayers 111111 543533254353 543267765345
2 boring Tales of Dragon Slayers 1121111 4324232342 4353543
奖励内容
虽然我们在做.关于每个id类型是否具有相同数量的id,数据是模棱两可的.假设他们没有.
Bonus Content
While we are at it. The data is ambiguous in regards to whether or not each id type has the same number of ids. Suppose they did not.
data = {1:{
'Name': "Thrilling Tales of Dragon Slayers",
'IDs':{
"StoreID": ['123445452543'],
"BookID": ['543533254353'],
"SalesID": ['543267765345']}},
2:{
'Name': "boring Tales of Dragon Slayers",
'IDs':{
"StoreID": ['111111', '1121111'],
"BookID": ['543533254353', '4324232342'],
"SalesID": ['543267765345', '4353543', 'extra id']}}}
然后我们可以使用 itertools
from itertools import zip_longest
pd.DataFrame([
{'Name': r['Name'], **dict(zip(r['IDs'], x))}
for r in data.values() for x in zip_longest(*r['IDs'].values())
])
Name StoreID BookID SalesID
0 Thrilling Tales of Dragon Slayers 123445452543 543533254353 543267765345
1 boring Tales of Dragon Slayers 111111 543533254353 543267765345
2 boring Tales of Dragon Slayers 1121111 4324232342 4353543
3 boring Tales of Dragon Slayers None None extra id
这篇关于使用pd.json_normalize展平字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!