问题描述
在Spark SQL中是否有进行如果不存在则进行更新的插入操作"的规定.
Is there any provision of doing "INSERT IF NOT EXISTS ELSE UPDATE" in Spark SQL.
我有Spark SQL表"ABC",其中包含一些记录.然后我还有另一笔记录,我想根据它们是否存在于此表中在此表中插入/更新.
I have Spark SQL table "ABC" that has some records.And then i have another batch of records that i want to Insert/update in this table based on whether they exist in this table or not.
有没有我可以在SQL查询中使用的SQL命令来实现这一目标?
is there a SQL command that i can use in SQL query to make this happen?
推荐答案
在常规Spark中,可以通过join
后跟map
这样的方式来实现:
In regular Spark this could be achieved with a join
followed by a map
like this:
import spark.implicits._
val df1 = spark.sparkContext.parallelize(List(("id1", "orginal"), ("id2", "original"))).toDF("df1_id", "df1_status")
val df2 = spark.sparkContext.parallelize(List(("id1", "new"), ("id3","new"))).toDF("df2_id", "df2_status")
val df3 = df1
.join(df2, 'df1_id === 'df2_id, "outer")
.map(row => {
if (row.isNullAt(2))
(row.getString(0), row.getString(1))
else
(row.getString(2), row.getString(3))
})
这将产生:
scala> df3.show
+---+--------+
| _1| _2|
+---+--------+
|id3| new|
|id1| new|
|id2|original|
+---+--------+
您也可以将select
与udfs
一起使用,而不是map
,但是在这种特殊情况下使用空值的情况下,我个人更喜欢使用map
变体.
You could also use select
with udfs
instead of map
, but in this particular case with null-values, I personally prefer the map
variant.
这篇关于如果在Spark SQL中不存在其他更新,则进行插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!