使用Json.NET反序列化大文件

本文介绍了使用Json.NET反序列化大文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图处理大量数据(〜1000个单独的文件，每个文件〜30 MB)，以便用作机器学习算法训练阶段的输入.用JSON格式化的原始数据文件，我使用Json.NET的JsonSerializer类反序列化.在程序结束时，Newtonsoft.Json.dll引发 'OutOfMemoryException' 错误.有没有办法减少内存中的数据，还是我必须更改所有方法(例如切换到Spark等大数据框架)来解决此问题?

I am trying to process a very large amount of data (~1000 seperate files, each of them ~30 MB) in order to use as input to the training phase of a machine learning algorithm. Raw data files formatted with JSON and I deserialize them using JsonSerializer class of Json.NET. Towards the end of the program, Newtonsoft.Json.dll throwing 'OutOfMemoryException' error. Is there a way to reduce the data in memory, or do I have to change all of my approach (such as switching to a big data framework like Spark) to handle this problem?

public static List<T> DeserializeJsonFiles<T>(string path)
{
    if (string.IsNullOrWhiteSpace(path))
        return null;

    var jsonObjects = new List<T>();
    //var sw = new Stopwatch();
    try
    {
        //sw.Start();
        foreach (var filename in Directory.GetFiles(path))
        {
            using (var streamReader = new StreamReader(filename))
            using (var jsonReader = new JsonTextReader(streamReader))
            {
                jsonReader.SupportMultipleContent = true;
                var serializer = new JsonSerializer();

                while (jsonReader.Read())
                {
                    if (jsonReader.TokenType != JsonToken.StartObject)
                        continue;

                    var jsonObject = serializer.Deserialize<dynamic>(jsonReader);

                    var reducedObject = ApplyFiltering(jsonObject) //return null if the filtering conditions are not met
                    if (reducedObject == null)
                        continue;

                    jsonObject = reducedObject;
                    jsonObjects.Add(jsonObject);
                }
            }
        }
        //sw.Stop();
        //Console.WriteLine($"Elapsed time: {sw.Elapsed}, Elapsed mili: {sw.ElapsedMilliseconds}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error: {ex}")
        return null;
    }

    return jsonObjects;
}

谢谢.

NET反序列化大文件

问题描述

推荐答案