问题描述
我正在使用csvfiles.我的目标是编写包含csvfile信息的json格式.特别是,我想获得与miserables.json
I'm working with csvfiles. My goal is to write a json format with csvfile information. Especifically, I want to get a similar format as miserables.json
示例:
{"source": "Napoleon", "target": "Myriel", "value": 1},
根据我具有的格式的信息,将是:
According with the information I have the format would be:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": "Germany",
"target": "USA",
"value": 2
},
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
但是,使用我使用的代码,输出如下:
However, with the code I used the output looks as follow:
[
{
"source": "Germany",
"target": "Mexico",
"value": 1
},
{
"source": null,
"target": "USA",
"value": 2
}
][
{
"source": "Brazil",
"target": "Argentina",
"value": 3
}
]
Null
的来源必须是德国.这是主要问题之一,因为有更多的城市存在该问题.除此之外,该信息是正确的.我只想删除格式内的几个列表,然后将null替换为正确的国家/地区.
Null
source must be Germany. This is one of the main problems, because there are more cities with that issue. Besides this, the information is correct. I just want to remove several list inside the format and replace null to correct country.
这是我在pandas
和collections
中使用的代码.
This is the code I used using pandas
and collections
.
csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1')
countries = csvdata['country'].tolist()
newcountries = list(set(countries))
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
sourceTemp = []
value = []
country = element
for k,v in frquency.items():
sourceTemp.append(k)
value.append(int(v))
forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)}
dfForce = DataFrame(forceData)
jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable)
parsed = json.loads(jsondata)
newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True)
# since to_json doesn´t have append mode this will be written in txt file
savetxt = open('data.txt', 'a')
savetxt.write(newData)
savetxt.close()
任何解决此问题的建议,我们都会感激!
Any suggestion to solve this problem are appreciate!
谢谢
推荐答案
请考虑删除标量值国家附近的Series()
.通过执行此操作,然后将系列的字典放大到一个数据帧中,您可以将NaN
(后来在json中转换为null
)强加到该系列中,以匹配其他系列的长度.您可以通过打印dfForce数据框来看到这一点:
Consider removing the Series()
around the scalar value, country. By doing so and then upsizing the dictionaries of series into a dataframe, you force NaN
(later converted to null
in json) into the series to match the lengths of other series. You can see this by printing out the dfForce dataframe:
from pandas import Series
from pandas import DataFrame
country = 'Germany'
sourceTemp = ['Mexico', 'USA', 'Argentina']
value = [1, 2, 3]
forceData = {'source': Series(country),
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 NaN USA 2
# 2 NaN Argentina 3
要解决此问题,只需将国家/地区作为系列字典中的标量:
To resolve, simply keep country as scalar in dictionary of series:
forceData = {'source': country,
'target': Series(sourceTemp),
'value': Series(value)}
dfForce = DataFrame(forceData)
# source target value
# 0 Germany Mexico 1
# 1 Germany USA 2
# 2 Germany Argentina 3
顺便说一句,您不需要将dataframe对象输出到json.只需使用词典列表.考虑使用有序词典集合进行以下操作(以保持顺序)键).这样,不断增长的列表将不添加到文件中而直接转储到文本文件中,这将导致无效的json,因为面对相反的相邻方括号...][...
是不允许的.
By the way, you do not need a dataframe object to output to json. Simply use a list of dictionaries. Consider the following using an Ordered Dictionary collection (to maintain the order of keys). In this way the growing list dumps into a text file without appending which would render an invalid json as opposite facing adjacent square brackets ...][...
are not allowed.
from collections import OrderedDict
...
data = []
for element in newcountries:
bills = csvdata['target'][csvdata['country'] == element]
frquency = Counter(bills)
for k,v in frquency.items():
inner = OrderedDict()
inner['source'] = element
inner['target'] = k
inner['value'] = int(v)
data.append(inner)
newData = json.dumps(data, indent=4)
with open('data.json', 'w') as savetxt:
savetxt.write(newData)
这篇关于使用Pandas Series和DataFrame编写JSON格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!