问题描述
我无法在Ruby中处理巨大的JSON文件.我正在寻找的是一种按条目处理它的方法,而不会在内存中保留太多数据.
I'm having trouble processing a huge JSON file in Ruby. What I'm looking for is a way to process it entry-by-entry without keeping too much data in memory.
我认为 yajl-ruby 宝石可以完成这项工作,但它占用了我所有的内存.我也看过 Yajl :: FFI 和JSON:Stream gems,但显然说:
I thought that yajl-ruby gem would do the work but it consumes all my memory. I've also looked at Yajl::FFI and JSON:Stream gems but there it is clearly stated:
这是我对Yajl所做的事情:
Here's what I've done with Yajl:
file_stream = File.open(file, "r")
json = Yajl::Parser.parse(file_stream)
json.each do |entry|
entry.do_something
end
file_stream.close
在进程被杀死之前,内存使用率一直在提高.
The memory usage keeps getting higher until the process is killed.
我不明白为什么Yajl将处理过的条目保留在内存中.我可以以某种方式释放它们,还是只是误解了Yajl解析器的功能?
I don't see why Yajl keeps processed entries in the memory. Can I somehow free them, or did I just misunderstood the capabilities of Yajl parser?
如果使用Yajl无法做到这一点:是否可以通过任何库在Ruby中做到这一点?
If it cannot be done using Yajl: is there a way to do this in Ruby via any library?
推荐答案
@CodeGnome和@A都可以. Rager的回答帮助我理解了解决方案.
Both @CodeGnome's and @A. Rager's answer helped me understand the solution.
我最终创建了宝石 json-streamer ,该宝石提供了一种通用方法,并且省去了需求为每种情况手动定义回调.
I ended up creating the gem json-streamer that offers a generic approach and spares the need to manually define callbacks for every scenario.
这篇关于如何在不消耗所有内存的情况下将巨大的JSON文件作为Ruby中的流进行处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!