本文介绍了拆分/切片大型JSON自由排序,按几列唯一(&A)使用JQ添加其他元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
通过Split/Slice large JSON using jq,我们能够根据数组大小将巨大的输入文件成功切片为较小的数据块。
要向其添加一个新的json元素,并根据原始数组的长度递增序列号以及每隔几列进行筛选/唯一。
输入:
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
预期产量:添加附加密钥后
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":2,"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
按2的数组大小执行筛选(按州、市、邮政编码)和切片后
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"}]}
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
以下示例用于按几列进行筛选/唯一,未达到最佳性能
input.json jq -r --argjson size 2 ' .add |= unique_by({city,state,postal}) | del(.add) as $object | (.add|_nwise($size) | (" ", $object + {add:.} )) ' | awk ' /^ / {fn++; next} { print >> "part-" fn ".json"}'
推荐答案
以下是一个解决方案,它使用两个通用筛选器-一个用于枚举,另一个用于unique_by
的无排序和面向流的变体:
# counting from 1
def enumerate(s; $key): foreach s as $x (0; .+1; {($key): .} + $x);
# emits a stream of the first item, $x, in the stream for which f assumes the value ($x|f).
def uniques_by(stream; f):
reduce stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| if .[$t] | has($y) then . else .[$t][$y] = $x end )
| .[][] ;
.add |= [enumerate(uniques_by(.[]; {city,state,postal}); "rownum")]
| del(.add) as $object
| (.add|_nwise($size) | (" ", $object + {add:.} ))
这篇关于拆分/切片大型JSON自由排序,按几列唯一(&A)使用JQ添加其他元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!