使用Pandas Series和DataFrame编写JSON格式 | Series和DataFrame编写JSON格式

本文介绍了使用Pandas Series和DataFrame编写JSON格式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用csvfiles.我的目标是编写包含csvfile信息的json格式.特别是，我想获得与miserables.json

I'm working with csvfiles. My goal is to write a json format with csvfile information. Especifically, I want to get a similar format as miserables.json

示例:

{"source": "Napoleon", "target": "Myriel", "value": 1},

根据我具有的格式的信息，将是:

According with the information I have the format would be:

[
{
    "source": "Germany",
    "target": "Mexico",
    "value": 1
},
{
    "source": "Germany",
    "target": "USA",
    "value": 2
},
{
    "source": "Brazil",
    "target": "Argentina",
    "value": 3
}
]

但是，使用我使用的代码，输出如下:

However, with the code I used the output looks as follow:

[
{
    "source": "Germany",
    "target": "Mexico",
    "value": 1
},
{
    "source": null,
    "target": "USA",
    "value": 2
}
][
{
    "source": "Brazil",
    "target": "Argentina",
    "value": 3
}
]

Null的来源必须是德国.这是主要问题之一，因为有更多的城市存在该问题.除此之外，该信息是正确的.我只想删除格式内的几个列表，然后将null替换为正确的国家/地区.

Null source must be Germany. This is one of the main problems, because there are more cities with that issue. Besides this, the information is correct. I just want to remove several list inside the format and replace null to correct country.

这是我在pandas和collections中使用的代码.

This is the code I used using pandas and collections.

csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1')
countries = csvdata['country'].tolist()
newcountries = list(set(countries))
for element in newcountries:
    bills = csvdata['target'][csvdata['country'] == element]
    frquency = Counter(bills)
    sourceTemp = []
    value = []
    country = element
    for k,v in frquency.items():
        sourceTemp.append(k)
        value.append(int(v))
    forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)}
    dfForce = DataFrame(forceData)
    jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable)
    parsed = json.loads(jsondata)
    newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True)
    # since to_json doesn´t have append mode this will be written in txt file
    savetxt = open('data.txt', 'a')
    savetxt.write(newData)
    savetxt.close()

任何解决此问题的建议，我们都会感激！

Any suggestion to solve this problem are appreciate!

谢谢

推荐答案

请考虑删除标量值国家附近的Series().通过执行此操作，然后将系列的字典放大到一个数据帧中，您可以将NaN(后来在json中转换为null)强加到该系列中，以匹配其他系列的长度.您可以通过打印dfForce数据框来看到这一点:

Consider removing the Series() around the scalar value, country. By doing so and then upsizing the dictionaries of series into a dataframe, you force NaN (later converted to null in json) into the series to match the lengths of other series. You can see this by printing out the dfForce dataframe:

from pandas import Series
from pandas import DataFrame

country = 'Germany'
sourceTemp = ['Mexico', 'USA', 'Argentina']
value = [1, 2, 3]

forceData = {'source': Series(country),
             'target': Series(sourceTemp),
             'value': Series(value)}
dfForce = DataFrame(forceData)

#     source     target  value
# 0  Germany     Mexico      1
# 1      NaN        USA      2
# 2      NaN  Argentina      3

要解决此问题，只需将国家/地区作为系列字典中的标量:

To resolve, simply keep country as scalar in dictionary of series:

forceData = {'source': country,
             'target': Series(sourceTemp),
             'value': Series(value)}
dfForce = DataFrame(forceData)

#     source     target  value
# 0  Germany     Mexico      1
# 1  Germany        USA      2
# 2  Germany  Argentina      3

顺便说一句，您不需要将dataframe对象输出到json.只需使用词典列表.考虑使用有序词典集合进行以下操作(以保持顺序)键).这样，不断增长的列表将不添加到文件中而直接转储到文本文件中，这将导致无效的json，因为面对相反的相邻方括号...][...是不允许的.

By the way, you do not need a dataframe object to output to json. Simply use a list of dictionaries. Consider the following using an Ordered Dictionary collection (to maintain the order of keys). In this way the growing list dumps into a text file without appending which would render an invalid json as opposite facing adjacent square brackets ...][... are not allowed.

from collections import OrderedDict
...

data = []

for element in newcountries:
    bills = csvdata['target'][csvdata['country'] == element]
    frquency = Counter(bills)

    for k,v in frquency.items():
        inner = OrderedDict()
        inner['source']  = element
        inner['target'] = k
        inner['value'] = int(v)

        data.append(inner)

newData = json.dumps(data, indent=4)

with open('data.json', 'w') as savetxt:
    savetxt.write(newData)

这篇关于使用Pandas Series和DataFrame编写JSON格式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！