问题描述
我想从 Athena 中的嵌套 JSON 创建一个表.此处描述的解决方案使用 hive Openx-JsonSerDe 等工具尝试在 SQL 语句中镜像 JSON 数据.我只想从 JSON 文件中获取一些字段并创建表.我似乎找不到任何关于如何做到这一点的资源.
I'd like to create a table from a nested JSON in Athena. The solutions described here using tools like hive Openx-JsonSerDe attempt to mirror the JSON data in the SQL statement. I just want to get a few fields from the JSON file and create the table. I can't seem to find any resources on how to do that.
例如JSON 文件 {"records": [{"a": "data1", "b": "data2", "c": "data3"}]}
我想创建的表只有列 a
和 b
E.g.JSON file {"records": [{"a": "data1", "b": "data2", "c": "data3"}]}
The table I'd like to create just only has columns a
and b
推荐答案
我认为您想要实现的是取消嵌套数组以将一个数组条目转换为一行.
I think what you are trying to achieve is unnesting the array to transform one array entry into one row.
这可以通过正确查询您的数据结构来实现.
This is possible through the correct querying of your data structure.
表定义:
CREATE external TABLE complex (
records array<struct<a:string,b:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucket/test1/';
查询:
select record.a,record.b from complex
cross join UNNEST(complex.records) as t1(record);
这篇关于AWS Athena 扁平化嵌套 JSON 源中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!