pandas json_normalize所有列都有嵌套的字典展平

本文介绍了pandas json_normalize所有列都有嵌套的字典展平的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个嵌套的字典(json)，该字典是从非官方的Google字典API返回的.

I have a nested dictionary (json) which was returned from the unofficial google dictionary API.

看起来像这样:

{'word': 'slack',
 'phonetic': '/slak/',
 'meaning': {'adjective': [{'definition': 'Not taut or held tightly in position; loose.',
    'example': 'a slack rope',
    'synonyms': ['loose',
     'limp',
     'not taut',
     'not tight',
     'hanging',
     'flapping']},
   {'definition': '(of business) characterized by a lack of work or activity; quiet.',
    'example': 'business was rather slack'},
   {'definition': 'Having or showing laziness or negligence.',
    'example': 'slack accounting procedures',
    'synonyms': ['lax',
     'negligent',
     'neglectful',
     'remiss',
     'careless',
     'slapdash',
     'slipshod',
     'lackadaisical',
     'lazy',
     'inefficient',
     'incompetent',
     'inattentive',
     'offhand',
     'casual',
     'disorderly',
     'disorganized']},
   {'definition': '(of a tide) neither ebbing nor flowing.',
    'example': 'soon the water will become slack, and the tide will turn'}],
  'noun': [{'definition': 'The part of a rope or line which is not held taut; the loose or unused part.',
    'example': 'I picked up the rod and wound in the slack',
    'synonyms': ['looseness', 'play', 'give']},
   {'definition': 'Casual trousers.'},
   {'definition': 'A spell of inactivity or laziness.',
    'example': 'he slept deeply, refreshed by a little slack in the daily routine',
    'synonyms': ['lull',
     'pause',
     'respite',
     'spell of inactivity',
     'interval',
     'break',
     'hiatus',
     'breathing space']}],
  'verb': [{'definition': 'Loosen (something, especially a rope).'},
   {'definition': 'Decrease or reduce in intensity, quantity, or speed.',
    'example': 'the flow of blood slacked off',
    'synonyms': ['reduce',
     'lessen',
     'slacken',
     'slow',
     'ease off',
     'ease up']},
   {'definition': 'Work slowly or lazily.',
    'example': 'she reprimanded her girls if they were slacking',
    'synonyms': ['idle',
     'shirk',
     'be inactive',
     'be lazy',
     'be indolent',
     'sit back and do nothing',
     'waste time',
     'lounge about']},
   {'definition': 'Slake (lime).'}],
  'adverb': [{'definition': 'Loosely.',
    'example': 'their heads were hanging slack in attitudes of despair'}]}}

这是松弛一词的含义.要获得此含义，我们可以使用google或简单地使用以下代码:

This is the meaning of the word slack.To get this meaning we can google the meaning or simply use the following code:

import numpy as np
import pandas as pd
import json
from pandas.io.json import json_normalize
from io import StringIO
import requests

word = 'slack'
url =  'https://googledictionaryapi.eu-gb.mybluemix.net/?define=' + word
response = requests.get(url)
content = response.content.decode('utf-8') # list of ugly strings
j = json.loads(content) # json list having nested dictionary
j = j[0]
j

现在，字典j具有三个键.

Now, the dictionary j has three keys.

j.keys() # dict_keys(['word', 'phonetic', 'meaning'])

我主要对以下含义感兴趣:

I am mainly interested in the meaning:

j['meaning'].keys() # dict_keys(['adjective', 'noun', 'verb', 'adverb'])

要获取熊猫数据框，我使用了以下代码:

To get the pandas dataframe I used following code:

json_normalize(data=j['meaning'])

这将提供一个只有4列的数据框.

This gives a dataframe with only 4 columns.

在这里，语音的每个部分(形容词，名词等)都必须具有定义"键，并且示例"和同义词"是可选的.

Here, each part of speech ( adjective, noun, etc) must have 'definition' key and 'example' and 'synonyms' are optional.

j['meaning']['adjective'][0].keys() # dict_keys(['definition', 'example', 'synonyms'])

如何获取4 * 3 = 12列的数据框，其列名称如adjective_definition，adjective_example，....，verb_synonyms?

How to get the dataframe with 4 * 3 = 12 columns, with column names like adjective_definition, adjective_example, ...., verb_synonyms?

我试图从以下链接中获得一些想法:

I tried to get some ideas from following links:

http://pandas.pydata.org/pandas-docs/version/0.17.0/generation/pandas.io.json.json_normalize.html
https://www.kaggle.com/jboysen/quick-tutorial-flatten-nested-json-in-pandas/notebook
pandas.io.json.json_normalize带有非常嵌套的json

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html
https://www.kaggle.com/jboysen/quick-tutorial-flatten-nested-json-in-pandas/notebook
pandas.io.json.json_normalize with very nested json

但是，无法解决问题.帮助将不胜感激.

But, could not solve the problem. Help will be appreciated.

推荐答案

我认为使用 json_normalize 的 record_path 参数将解决您的问题.由于record_path旨在成为json对象或记录列表的单个路径，因此我不得不多次调用json_normalize，然后将结果连接起来以获取具有所需数据的数据框.您还可以尝试使用 record_prefix 参数来设置列命名约定.希望这会有所帮助！

I think using json_normalize's record_path parameter will solve your problem. Since record_path is intended to be a single path to a list of json objects or records, I had to call json_normalize more than once and then concatenate the results to get a dataframe with the data you want. You can also experiment with the record_prefix parameter to set the column naming convention. Hope this helps!

from pandas.io.json import json_normalize
from io import StringIO
import requests

word = 'slack'
url =  'https://googledictionaryapi.eu-gb.mybluemix.net/?define=' + word
response = requests.get(url)
content = response.content.decode('utf-8') # list of ugly strings
j = json.loads(content) # json list having nested dictionary
j = j[0]

df_adj = json_normalize(data=j['meaning'], record_path=["adjective"], record_prefix="adjective.")
df_verb = json_normalize(data=j['meaning'], record_path=["verb"], record_prefix="verb.")
df_adv = json_normalize(data=j['meaning'], record_path=["adverb"], record_prefix="adverb.")
df_noun = json_normalize(data=j['meaning'], record_path=["noun"], record_prefix="noun.")

df = pd.concat([df_adj, df_verb, df_adv, df_noun], axis=1)
print(df.head(3))

这篇关于pandas json_normalize所有列都有嵌套的字典展平的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！