问题描述
我有一个用例,其中有一个包含 SQL 查询序列的列的表.
I have a use case where in there is a table with one column which has sequence of SQL queries.
我想在 spark 程序中一个接一个地运行这些 SQL 查询,而不是并行运行.这是因为第 N 行的 SQL 查询将依赖于第 (N-1) 行.
I want to run these SQL queries in spark program one after the other and not in parallel. This is because SQL query on Nth row will have dependency on (N-1)th row.
现在由于这个限制,我想一个接一个地顺序执行这个,而不是并行执行.我怎样才能做到这一点?
Now due to this constraint I want to execute this sequentially one after the other rather than in parallel. How can I achieve this?
推荐答案
我认为你可以使用这样的方法:
I think you could use something like this:
val listOfQueryRows = spark.sqlContext.table("foo_db.table_of_queries")
.select(col("sql_query"))
.orderBy(col("query_index"))
.collectAsList()
listOfQueryRows.forEach(queryRow => spark.sql(queryRow.getString(0)))
这将选择 sql_query
列中的所有查询,按照 query_index
中给定的索引对它们进行排序,并将它们收集在列表 listOfQueryRows
中> 在驱动程序中.然后对列表进行迭代,依次为每个返回的行执行查询.
This will select all your queries in the sql_query
column, order them by the index given in the query_index
and collects them in the list listOfQueryRows
in the driver. The list is then iterated over sequentially executing the query for each returned row.
这篇关于如何按顺序运行火花作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!