我从OECD下载了有关收入不平等的数据集作为CSV文件。我只想保留数据:LOCATION,TIME,VALUE。
CSV头部的部分如下所示:
"LOCATION","INDICATOR","SUBJECT","MEASURE","FREQUENCY","TIME","Value","Flag Codes"
"AUS","INCOMEINEQ","GINI","INEQ","A","2014",0.337,
"AUS","INCOMEINEQ","GINI","INEQ","A","2016",0.33,
"AUT","INCOMEINEQ","GINI","INEQ","A","2014",0.274,
"AUT","INCOMEINEQ","GINI","INEQ","A","2015",0.276,
"AUT","INCOMEINEQ","GINI","INEQ","A","2016",0.284,
到目前为止,这是我的转换器代码:
#!/usr/bin/env python
"""Universal CSV to JSON converter with scalability options"""
__author__ = "Tim Verlaan 11669128"
import csv
import json
def convert():
"""Convert CSV file to JSON file"""
# Open the CSV
f = open( 'data.csv')
# Change each fieldname to the appropriate field name.
reader = csv.DictReader( f, fieldnames = ( "LOCATION","INDICATOR","SUBJECT","MEASURE","FREQUENCY","TIME","Value","Flag Codes" ))
# skip the header
next(reader)
# Parse the CSV into JSON
out = json.dumps( [ row for row in reader ] )
# Save the JSON
f = open( 'data_oecd.json', 'w')
f.write(out)
if __name__ == "__main__":
"""Separating the function, for scalability purposes"""
convert()
现在的结果:
[{"LOCATION": "AUS", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2014", "Value": "0.337", "Flag Codes": ""}, {"LOCATION": "AUS", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2016", "Value": "0.33", "Flag Codes": ""}, {"LOCATION": "AUT", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2014", "Value": "0.274", "Flag Codes": ""}, {"LOCATION": "AUT", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2015", "Value": "0.276", "Flag Codes": ""}, {"LOCATION": "AUT", "INDICATOR": "INCOMEINEQ", "SUBJECT": "GINI", "MEASURE": "INEQ", "FREQUENCY": "A", "TIME": "2016", "Value": "0.284", "Flag Codes": ""}
想要的结果:
[{"LOCATION": "AUS", "TIME": 2014, "VALUE": 0.337}, {"LOCATION": "AUS", "TIME": 2016, "VALUE": 0.33}
最佳答案
使用熊猫很容易做到这一点:
import pandas as pd
df = pd.read_csv('data.csv')
df[['LOCATION', 'TIME', 'Value']].to_json(orient='records')
orient='records'
部分很重要,否则它将按列而不是行进行分组关于python - 在删除某些列的同时将CSV转换为JSON,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56003944/