问题描述
我将JSON转换为DataFrame,最后得到一列"Structure_value",该列具有以下值作为字典/词典列表:
I converted a JSON into DataFrame and ended up with a column 'Structure_value' having below values as a list of dictionary/dictionaries:
Structure_value
[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}]
[{'Room': [6], 'Length': 22}]
[{'Room': [6,6], 'Length': 8}]
我需要将其分为以下四列:
I need to split it into below four columns:
Structure_value_room_1Structure_value_length_1Structure_value_room_2Structure_value_length_2
Structure_value_room_1Structure_value_length_1Structure_value_room_2Structure_value_length_2
其输出应如下:
Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 \
0 6 7 6.0
1 6 22 NaN
2 6 8 6.0
Structure_value_length_2
0 7.0
1 NaN
2 8.0
如何处理单个属性在单个列表中具有多个值的情况,我们需要将它们拆分为其他列.
How to handle such cases where a single attribute has multiple values in a single list and we need to split them into other columns.
附言:我可以处理以下类型的数据如下情况:[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}]
,但我无法处理这种情况[{'Room': [6,6], 'Length': 8}]
.
P.S.: I am able to handle these type of cases where data is like this : [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}]
but I am unable to handle this case [{'Room': [6,6], 'Length': 8}]
.
推荐答案
我无法将您的Structure_value表示形式作为json文件处理,我不知道它们是否代表许多单个文件.我使用了[{'Room':[6],'Length':7},{'Room':[6],'Length':7}]作为file1和[{'Room':[6],'Length ':22}]作为文件2,[{'Room':[6,6],'Length':8}]作为文件3.
I could not handle your Structure_value presentation as a json file, I don't know if they represent many single files.I used [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}] as file1 and [{'Room': [6], 'Length': 22}] as file2 and [{'Room': [6,6], 'Length': 8}] as file3.
#treat the irregular structures
def process_structure(s):
specs = []
for label,quantity in s.items():
if isinstance(quantity,list):
specs.append(label)
for elem in quantity:
specs.append(elem)
elif isinstance(quantity,int):
specs.append(label)
specs.append(quantity)
return specs
#open and treat jsons
def treat_json(file):
with open(file, 'r') as f:
dicts = {}
to_df = []
load_df = []
valRoom = 0
valLen = 0
structures = json.load(f)
for dicts in structures:
to_df = process_structure(dicts)
long = len(to_df)
for i in range(0,long):
if to_df[i] == 'Room':
valRoom = to_df[i+1]
load_df.append(valRoom)
elif to_df[i] == 'Length':
valLen = to_df[i+1]
load_df.append(valLen)
elif isinstance(to_df[i],int) and i < (long - 1):
if isinstance(to_df[i+1],int):
load_df.append(to_df[i+1])
load_df.append(valLen)#repeat Length
while len(load_df) < 4: #if its no complete
load_df.append(None)
df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])
return df_temp
那是照片:
treat_json('house3.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... 8
[1 rows x 4 columns]
treat_json('house2.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... None
[1 rows x 4 columns]
treat_json('house1.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... 7
[1 rows x 4 columns]
这篇关于如何将具有变化列表(作为字典值)的嵌套json结构转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!