本文介绍了如何使用NaNs json_normalize列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
- 此问题特定于
pandas.DataFrame
- 该问题取决于列中的值是
str
,dict
还是list
类型。 - 此问题解决了在
df.dropna()时处理
无效。NaN
值的问题。 .reset_index(drop = True)
- This question is specific to columns of data in a
pandas.DataFrame
- This question depends on if the values in the columns are
str
,dict
, orlist
type. - This question addresses dealing with the
NaN
values, whendf.dropna().reset_index(drop=True)
isn't a valid option.
- 使用
str
类型的列,该列中的值必须转换为dict
类型,使用ast.literal_eval
,然后使用.json_normalize
。
- With a column of
str
type, the values in the column must be converted todict
type, withast.literal_eval
, before using.json_normalize
.
import numpy as np
import pandas as pd
from ast import literal_eval
df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]})
col_str
0 {"a": "46", "b": "3", "c": "12"}
1 {"b": "2", "c": "7"}
2 {"c": "11"}
3 NaN
type(df.iloc[0, 0])
[out]: str
df.col_str.apply(literal_eval)
错误:
df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan
案例2
- 使用
dict
类型的列,使用pandas.json_normalize
将键转换为列标题,将值转换为行 - With a column of
dict
type, usepandas.json_normalize
to convert keys to column headers and values to rows
Case 2
df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]})
col_dict
0 {'a': '46', 'b': '3', 'c': '12'}
1 {'b': '2', 'c': '7'}
2 {'c': '11'}
3 NaN
type(df.iloc[0, 0])
[out]: dict
pd.json_normalize(df.col_dict)
错误:
pd.json_normalize(df.col_dict) results in AttributeError: 'float' object has no attribute 'items'
案例3
- 在
列中str
类型,在列表
内包含dict
。 - 要标准化列
- 应用
literal_eval
,因为在 str 类型 - 展开列以分隔
dict
分隔行 - 标准化列
- In a column of
str
type, with thedict
inside alist
. - To normalize the column
- apply
literal_eval
, because explode doesn't work onstr
type - explode the column to separate the
dicts
to separate rows - normalize the column
df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]}) col_str 0 [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}] 1 [{"b": "2", "c": "7"}, {"c": "11"}] 2 NaN type(df.iloc[0, 0]) [out]: str df.col_str.apply(literal_eval)
错误:
df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan
推荐答案
- 正如评论中指出的那样,始终可以选择:
-
df = df.dropna()。reset_index(drop = True)
- 这里的虚拟数据很好,或者处理与其他列无关紧要的数据框时。
- 对于需要附加列的数据框来说,不是一个很好的选择。
- 由于该列包含
str
类型,带有'{}'
(astr
)的fillna - Since the column contains
str
types, fillna with'{}'
(astr
)
import numpy as np import pandas as pd from ast import literal_eval df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]}) col_str 0 {"a": "46", "b": "3", "c": "12"} 1 {"b": "2", "c": "7"} 2 {"c": "11"} 3 NaN type(df.iloc[0, 0]) [out]: str # fillna df.col_str = df.col_str.fillna('{}') # convert the column to dicts df.col_str = df.col_str.apply(literal_eval) # use json_normalize df = df.join(pd.json_normalize(df.col_str)).drop(columns=['col_str']) # display(df) a b c 0 46 3 12 1 NaN 2 7 2 NaN NaN 11 3 NaN NaN NaN
- 由于该列包含
dict
类型,所以fillna具有{}
(不是str
) - 由于<$ c,这需要使用字典理解来填充$ c> fillna({})不起作用
- Since the column contains
dict
types, fillna with{}
(not astr
) - This needs to be filled using a dict-comprehension, since
fillna({})
does not work
Case 2
df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]}) col_dict 0 {'a': '46', 'b': '3', 'c': '12'} 1 {'b': '2', 'c': '7'} 2 {'c': '11'} 3 NaN type(df.iloc[0, 0]) [out]: dict # fillna df.col_dict = df.col_dict.fillna({i: {} for i in df.index}) # use json_normalize df = df.join(pd.json_normalize(df.col_dict)).drop(columns=['col_dict']) # display(df) a b c 0 46 3 12 1 NaN 2 7 2 NaN NaN 11 3 NaN NaN NaN
- 用
'[]'
(astr
) - 现在
literal_eval
将起作用 -
.explode
可以在列上使用,将dict
的值分隔为行 - 现在,
NaNs
需要用{}
填充(而不是str
) - 然后可以对列进行规范化
- Fill the
NaNs
with'[]'
(astr
) - Now
literal_eval
will work .explode
can be used on the column to separate thedict
values to rows- Now the
NaNs
need to be filled with{}
(not astr
) - Then the column can be normalized
- 列是
列表中的
个列表
,而不是str
个类型,请跳到.explode
。 - For the case when the column is
lists
ofdicts
, that aren'tstr
type, skip to.explode
.
df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]}) col_str 0 [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}] 1 [{"b": "2", "c": "7"}, {"c": "11"}] 2 NaN type(df.iloc[0, 0]) [out]: str # fillna df.col_str = df.col_str.fillna('[]') # literal_eval df.col_str = df.col_str.apply(literal_eval) # explode df = df.explode('col_str').reset_index(drop=True) # fillna again df.col_str = df.col_str.fillna({i: {} for i in df.index}) # use json_normalize df = df.join(pd.json_normalize(df.col_str)).drop(columns=['col_str']) # display(df) a b c 0 46 3 12 1 NaN 2 7 2 NaN 2 7 3 NaN NaN 11 4 NaN NaN NaN
这篇关于如何使用NaNs json_normalize列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
-
- apply
Case 3
- 应用