问题描述
这是我的 mongodb 集合架构的一部分:
This is part of the schema of my mongodb collection:
|-- variables: struct (nullable = true)
| |-- actives: struct (nullable = true)
| | |-- data: struct (nullable = true)
| | | |-- 0: struct (nullable = true)
| | | | |--active: integer (nullable = true)
| | | | |-- inactive: integer (nullable = true)
我已获取集合并将其存储在 Spark 数据帧中,现在正在尝试提取 variables 列中最内层的值.
I've fetched the collection and stored it in a Spark dataframe and am now trying to extract the innermost values in the variables column.
df_temp = df1.select(df1.variables.actives.data)
这工作得很好,我能够获得数据结构的内部结构.
This works perfectly fine and I am able to get the inner structure of the data struct.
+----------------------+
|variables.actives.data|
+----------------------+
| [[1,32,0.516165...|
| [[1,30,1.173139...|
| [[4,18,0.160088...|
但是,一旦我尝试进一步:
However, as soon as I try to go in further:
df_temp = df1.select(df1.variables.actives.data.0.active)
我收到一个无效语法错误.
df_temp = df1.select(df1.variables.actives.data.0.active)
^
语法错误:无效语法
问题在于我的内部字段键的名称是数字,而我找不到内部字段键的名称是数字的示例.
The problem is with my inner field's key's name being a number and I couldn't find an example where the inner field key's name is a number.
实现我从数据框中检索最内层值(active 和 inactive)的最佳方法是什么?
What would be the best way to achieve my goal of retrieving the innermost values (active and inactive) from the dataframe?
推荐答案
你可以试试:
df_temp = df1.select(df1.variables.actives.data["0"].active)
这篇关于从包含嵌套值的 Spark 列中提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!