本文介绍了拆分/切片大型JSON自由排序,按几列唯一(&A)使用JQ添加其他元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过Split/Slice large JSON using jq,我们能够根据数组大小将巨大的输入文件成功切片为较小的数据块。

要向其添加一个新的json元素,并根据原始数组的长度递增序列号以及每隔几列进行筛选/唯一。

输入:

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
     {"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
     {"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
     {"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
 ]
}

预期产量:添加附加密钥后

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
     {"rownum":2,"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
     {"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
     {"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
 ]
}

按2的数组大小执行筛选(按州、市、邮政编码)和切片后

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
     {"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"}]}

{"recDt":"2021-01-05",
 "country":"US",
 "name":"ABC",
 "number":"9828",
 "add": [
     {"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
 ]
}

以下示例用于按几列进行筛选/唯一,未达到最佳性能

input.json jq -r --argjson size 2 ' .add |= unique_by({city,state,postal}) | del(.add) as $object | (.add|_nwise($size) | ("	", $object + {add:.} )) ' | awk ' /^	/ {fn++; next} { print >> "part-" fn ".json"}'

推荐答案

以下是一个解决方案,它使用两个通用筛选器-一个用于枚举,另一个用于unique_by的无排序和面向流的变体:

  # counting from 1
  def enumerate(s; $key): foreach s as $x (0; .+1; {($key): .} + $x);

  # emits a stream of the first item, $x, in the stream for which f assumes the value ($x|f).
  def uniques_by(stream; f): 
    reduce stream as $x ({};
      ($x|f) as $s
      | ($s|type) as $t
      | (if $t == "string" then $s else ($s|tojson) end) as $y
      | if .[$t] | has($y) then . else .[$t][$y] = $x end )
    | .[][] ;

  .add |= [enumerate(uniques_by(.[]; {city,state,postal}); "rownum")]
  | del(.add) as $object
  | (.add|_nwise($size) | ("	", $object + {add:.} ))

这篇关于拆分/切片大型JSON自由排序,按几列唯一(&A)使用JQ添加其他元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 15:00