问题描述
我有相对较小的对象(约7GB)的巨大json数组.
I have huge (~7GB) json array of relatively small objects.
有没有一种相对简单的方法来过滤这些对象而无需将整个文件加载到内存中?
Is there relatively simple way to filter these objects without loading whole file into memory?
-stream 选项看起来合适,但我不知道如何将[path,value]的流折叠到原始对象.
--stream option looks suitable, but I can't figure out how to fold stream of [path,value] to original objects.
推荐答案
jq 1.5具有流解析器. jq FAQ 提供了有关如何转换顶部的示例级别的JSON对象数组放入其元素流中:
jq 1.5 has a streaming parser. The jq FAQ gives an example of how to convert a top-level array of JSON objects into a stream of its elements:
$ jq -nc --stream 'fromstream(1|truncate_stream(inputs))'
[{"foo":"bar"},{"foo":"baz"}]
{"foo":"bar"}
{"foo":"baz"}
这对于您的目的可能已经足够,但是值得注意的是setpath/2可能会有所帮助.这是产生传单流的方法:
That may be enough for your purposes, but it is worthwhile noting that setpath/2 can be helpful. Here's how to produce a stream of leaflets:
jq -c --stream '. as $in | select(length == 2) | {}|setpath($in[0]; $in[1])'
jq手册中提供了更多信息和文档: https://stedolan.github.io/jq/manual/#Streaming
Further information and documentation is available in the jq manual:https://stedolan.github.io/jq/manual/#Streaming
这篇关于用jq处理巨大的json-array文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!