根据平台(使用Python)分割数据(来自文本/json文件)

本文介绍了根据平台(使用Python)分割数据(来自文本/json文件)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是示例数据(JSON文件) -用完全相同的行填充JSON文件，因为此"JSON"文件已准备好上载到Big Query，以寻找按平台拆分它的方式.

This is sample data ( JSON file) - JSON file filled in with exactly the same line, as this "JSON" file is ready to be uploaded to Big Query, looking for the way of splitting it by platform.

{"origin": {"detailed": "instagram", "source": "instagram", "platform": "instagram"}.....}
{"origin": {"detailed": "website", "source": "website", "platform": "website"}.....}
{"origin": {"detailed": "forum", "source": "forum", "platform": "forum"}.....}
{"origin": {"detailed": "twitter", "source": "twitter", "platform": "twitter"}.....}
{"origin": {"detailed": "facebook", "source": "facebook", "platform": "facebook"}.....}

我正在根据平台将这些数据拆分为不同的文本文件.

I'm looking for splitting this data into different text files based on platform.

if platform = instagram ( but some how it should be - if line contain "platform": "instagram" )
    write to post_instagram.json
if platform = facebook
    write to post_facebook.json
..............
    ...................

什么是干净的方法? -通过使用PYTHON

What is the clean way to do it? - BY using PYTHON

示例:

with open(FILE_NAME, "r") as infile:
    Data = infileFollow.read()
    If statements?
    while statement?
    .....

with open(FILE_NAME, "w+") as outfile:
    outfile.write(Data)

原因:我正在尝试吐出数据，因为无法创建将接受不同平台的单一架构，因为即使我为所有平台创建了具有所有列的架构，不同的平台都有额外的重复列会破坏一致性.因此，由于解决方案需要根据平台拆分数据，因此每个平台的数据架构都不同.

Reason:I'm looking to spit the data as could not create the single schema which will accept different platforms, as different platforms have extra repeated columns which break consistency, even if I create the schema with all columns for all platforms. So as solution need to split data based on platforms so that it will be a different schema for each.

推荐答案

也许是这样的:

import json 

json.dump([x for x in data if "instagram" in x["origin"]["platform"]], open("post_instagram.json", "w"))

json.dump([x for x in data if "facebook" in x["origin"]["platform"]], open("post_facebook.json", "w"))

# other platforms ...

如果数据非常庞大，而不是迭代每个平台"的所有数据:

If data is very huge instead of iterating all data for each "platform":

instagram = []
facebook = []

for d in data:
    if "instagram" in d["origin"]["platform"]:
        instagram.append(d)
    elif "facebook" in d["origin"]["platform"]:
        facebook.append(d)

json.dump(instagram, open("post_instagram.json", "w"))
json.dump(facebook, open("post_facebook.json", "w"))

这篇关于根据平台(使用Python)分割数据(来自文本/json文件)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！