这是我的嵌套JSON:

{
   "business_id":"pNQwnY_q4okdlnPiR-3RBA",
   "full_address":"6105 S Fort Apache Rd\nSpring Valley\nLas Vegas, NV 89148",
   "hours":{  },
   "open":true,
   "categories":[  ],
   "city":"Las Vegas",
   "review_count":68,
   "name":"Empire Bagels",
   "neighborhoods":[
      "Spring Valley"
   ],
   "longitude":-115.298175926911,
   "state":"NV",
   "stars":3.0,
   "latitude":36.07728616051,
   "attributes":{
      "Take-out":true,
      "Wi-Fi":"no",
      "Good For":{
         "dessert":false,
         "latenight":false,
         "lunch":false,
         "dinner":false,
         "breakfast":true,
         "brunch":false
      },
      "Caters":true,
      "Noise Level":"quiet",
      "Takes Reservations":false,
      "Delivery":false,
      "Ambience":{
         "romantic":false,
         "intimate":false,
         "classy":false,
         "hipster":false,
         "divey":false,
         "touristy":false,
         "trendy":false,
         "upscale":false,
         "casual":true
      },
      "Parking":{
         "garage":false,
         "street":false,
         "validated":false,
         "lot":true,
         "valet":false
      },
      "Has TV":true,
      "Outdoor Seating":true,
      "Attire":"casual",
      "Alcohol":"none",
      "Waiter Service":false,
      "Accepts Credit Cards":true,
      "Good for Kids":true,
      "Good For Groups":true,
      "Price Range":1
   },
   "type":"business"
}


我正在查询使用apache演练。我想找出城市中所有餐厅的十大最常见``真实''属性,例如:

Accepts Credit Cards : 200,
Alcohol: 300,
Good For Kids : 500


我的查询看起来如何?这是我所做的:

select attributes, count(*) attributes from `yelp_dataset` group by attributes;


我收到此错误:

Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REQUIRED and MAP:REQUIRED.

Fragment 0:0

[Error Id: 8fe8a616-92c7-4da0-ab65-b5542d391f47 on 192.168.10.104:31010] (state=,code=0)


我的查询应该是什么?

最佳答案

由于数据类型混合,我无法使用KVGEN()自动展平属性,但是您可以尝试使用UNION ALL蛮力进行CTE:

WITH ReviewAttributes AS (
    SELECT
      reviews.name,
      'Accepts Credit Cards' as `AttributeName`,
      CASE WHEN reviews.attributes.`Accepts Credit Cards` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Alcohol' as `AttributeName`,
      CASE WHEN reviews.attributes.`Alcohol` <> 'none' THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Good for Kids' as `AttributeName`,
      CASE WHEN reviews.attributes.`Good for Kids` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
)
SELECT
  `AttributeName`,
  SUM(`AttributeValue`) as `AttributeCount`
FROM
  ReviewAttributes
GROUP BY
  `AttributeName`;


CASE语句还可以帮助您解决布尔字段和枚举字段之间的某些差异,例如从样本中计算Alcohol

关于mysql - 查找嵌套JSON中具有“true”值的所有记录,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40648431/

10-10 13:34