问题描述
我想不同的方式来查询记录的阵列内的记录并显示完整的行作为输出。
I am trying different ways to query a record within a array of records and display complete Row as output.
我不知道该嵌套的对象具有字符串PG。但我想特定对象上查询。是否该对象具有PG与否。如果再PG的存在,我想显示完整的行。如何写嵌套对象星火SQL查询没有specfying对象index.So我不想使用的索引children.name
我的Avro记录:
{
"name": "Parent",
"type":"record",
"fields":[
{"name": "firstname", "type": "string"},
{
"name":"children",
"type":{
"type": "array",
"items":{
"name":"child",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
]
}
}
}
]
}
我使用SQL星火上下文来查询该读数据帧。
因此,如果输入
I am using Spark SQL context to query dataframe which is read.So if input is
Row no Firstname Children.name
1 John Max
Pg
2 Bru huna
aman
输出应该返回POQ 1,因为其具有行,其中children.name的一个目的是第
Output should return poq 1 since it has row where one object of children.name is pg.
val results = sqlc.sql("SELECT firstname, children.name FROM nestedread where children.name = 'pg'")
results.foreach(x=> println(x(0), x(1).toString))
上面的查询不起作用。但它的作品,当我查询儿童[1]。名称。
The above query doesn't work. but it works when i query children[1].name.
我也想知道,如果我能过滤一组记录,然后爆炸。不是首先发生爆炸并创造大量的行,然后进行筛选。
推荐答案
看来你可以用
org.apache.spark.sql.functions.explode(e: Column): Column
例如在我的项目(在Java),我有嵌套的JSON是这样的:
for example in my project(in java), i have nested json like this:
{
"error": [],
"trajet": [
{
"something": "value"
}
],
"infos": [
{
"something": "value"
}
],
"timeseries": [
{
"something_0": "value_0",
"something_1": "value_1",
...
"something_n": "value_n"
}
]
}
和我想分析时间序列DATAS,所以我所做的:
and i wanted to analyse datas in "timeseries", so i did:
DataFrame ts = jsonDF.select(org.apache.spark.sql.functions.explode(jsonDF.col("timeseries")).as("t"))
.select("t.something_0",
"t.something_1",
...
"t.something_n");
我是新来的火花了。希望这可以给你一个提示。
I'm new to spark too. Hope this could give you a hint.
这篇关于在查询记录镶木地板嵌套数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!