问题描述
我试图用Java解析大型json文件(更多600Mo)。
我的 json
文件如下所示:
I try to parse large json file (more 600Mo) with Java.My json
file look like that:
{
"0" : {"link_id": "2381317", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "42", "type": "Gamer", "website": "http://www.google.com", "name": "troll", "country": "United Kingdom", "sp": "Management Consulting" },
"1" : {"link_id": "2381316", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "41", "type": "Gamer", "website": "http://www.google2.com", "name": "troll2", "country": "United Kingdom", "sp": "Management Consulting" }
[....]
"345240" : {"link_id": "2381314", "overview": "mjklmklmklmklmk", "founded": "2015", "followers": "23", "type": "Gamer", "website": "http://www.google2.com", "name": "troll2", "country": "United Kingdom", "sp": "Management Consulting" }
}
,我的代码如下所示:
and my code looks like that:
public class dumpExtractor {
private static final String filePath = "/home/troll/Documents/analyse/lol.json";
public static void main(String[] args) {
try {
// read the json file
FileReader reader = new FileReader(filePath);
JSONParser jsonParser = new JSONParser();
JSONObject jsonObject = (JSONObject) jsonParser.parse(reader);
Iterator<JSONObject> iterator = jsonObject.values().iterator();
while (iterator.hasNext()) {
JSONObject jsonChildObject = iterator.next();
System.out.println("==========================");
String name = (String) jsonChildObject.get("name");
System.out.println("Industry name: " + name);
String type = (String) jsonChildObject.get("type");
if (type != null && !type.isEmpty()) {
System.out.println("type: " + type);
}
String sp = (String) jsonChildObject.get("sp");
if (sp != null && !sp.isEmpty()) {
System.out.println("sp: " + sp);
}
System.out.println("==========================");
}
System.out.println("done ! ");
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
我有这个错误:
I 've got this error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.createEntry(HashMap.java:897)
at java.util.HashMap.addEntry(HashMap.java:884)
at java.util.HashMap.put(HashMap.java:505)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
我该如何解决这个问题?
How I can fix that ?
在此先感谢。
推荐答案
如果您必须阅读巨大的JSON文件,您无法在内存中保存所有信息。
扩展内存可以是1 Gb文件的解决方案。如果明天的文件是2Gb文件?
If you have to read huge JSON Files you can't mantain in memory all informations.Extending memory can be a solution for a file of 1 Gb. If the files tomorrow is a 2 Gb Files?
解决这个问题的正确方法是使用流解析器按元素解析json元素。基本上不是将整个json加载到内存中,而是创建一个代表它的大对象,你需要阅读json的单个元素并逐步将它们转换为对象。
The right approach to this problem is to parse the json element by element using a streaming parser. Basically instead of loading the whole json in memory and creating a whole big object representing it you need to read single elements of the json and converting them to objects step by step.
您可以找到一篇不错的文章解释如何与杰克逊库做到这一点。
Here you find a nice article explaing how to do it with jackson library.
这篇关于“OutOfMemoryError:超出GC开销限制”:使用java解析大型json文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!