本文介绍了Spark 中有哪些连接类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我查看了文档,它说支持以下连接类型:
I looked at the docs and it says the following join types are supported:
要执行的连接类型.默认内部.必须是以下之一:内部、交叉、外,全,full_outer,左,left_outer,右,right_outer,left_semi,left_anti.
我查看了关于 SQL 连接和顶部的 StackOverflow 答案几个答案没有提到上面的一些连接,例如left_semi
和 left_anti
.它们在 Spark 中是什么意思?
I looked at the StackOverflow answer on SQL joins and top couple of answers do not mention some of the joins from above e.g. left_semi
and left_anti
. What do they mean in Spark?
推荐答案
这是一个简单的说明性实验:
Here is a simple illustrative experiment:
import org.apache.spark.sql._
object SparkSandbox extends App {
implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()
import spark.implicits._
spark.sparkContext.setLogLevel("ERROR")
val left = Seq((1, "A1"), (2, "A2"), (3, "A3"), (4, "A4")).toDF("id", "value")
val right = Seq((3, "A3"), (4, "A4"), (4, "A4_1"), (5, "A5"), (6, "A6")).toDF("id", "value")
println("LEFT")
left.orderBy("id").show()
println("RIGHT")
right.orderBy("id").show()
val joinTypes = Seq("inner", "outer", "full", "full_outer", "left", "left_outer", "right", "right_outer", "left_semi", "left_anti")
joinTypes foreach { joinType =>
println(s"${joinType.toUpperCase()} JOIN")
left.join(right = right, usingColumns = Seq("id"), joinType = joinType).orderBy("id").show()
}
}
输出
LEFT
+---+-----+
| id|value|
+---+-----+
| 1| A1|
| 2| A2|
| 3| A3|
| 4| A4|
+---+-----+
RIGHT
+---+-----+
| id|value|
+---+-----+
| 3| A3|
| 4| A4|
| 4| A4_1|
| 5| A5|
| 6| A6|
+---+-----+
INNER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 3| A3| A3|
| 4| A4| A4_1|
| 4| A4| A4|
+---+-----+-----+
OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 1| A1| null|
| 2| A2| null|
| 3| A3| A3|
| 4| A4| A4|
| 4| A4| A4_1|
| 5| null| A5|
| 6| null| A6|
+---+-----+-----+
FULL JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 1| A1| null|
| 2| A2| null|
| 3| A3| A3|
| 4| A4| A4|
| 4| A4| A4_1|
| 5| null| A5|
| 6| null| A6|
+---+-----+-----+
FULL_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 1| A1| null|
| 2| A2| null|
| 3| A3| A3|
| 4| A4| A4|
| 4| A4| A4_1|
| 5| null| A5|
| 6| null| A6|
+---+-----+-----+
LEFT JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 1| A1| null|
| 2| A2| null|
| 3| A3| A3|
| 4| A4| A4_1|
| 4| A4| A4|
+---+-----+-----+
LEFT_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 1| A1| null|
| 2| A2| null|
| 3| A3| A3|
| 4| A4| A4_1|
| 4| A4| A4|
+---+-----+-----+
RIGHT JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 3| A3| A3|
| 4| A4| A4_1|
| 4| A4| A4|
| 5| null| A5|
| 6| null| A6|
+---+-----+-----+
RIGHT_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
| 3| A3| A3|
| 4| A4| A4|
| 4| A4| A4_1|
| 5| null| A5|
| 6| null| A6|
+---+-----+-----+
LEFT_SEMI JOIN
+---+-----+
| id|value|
+---+-----+
| 3| A3|
| 4| A4|
+---+-----+
LEFT_ANTI JOIN
+---+-----+
| id|value|
+---+-----+
| 1| A1|
| 2| A2|
+---+-----+
这篇关于Spark 中有哪些连接类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!