本文介绍了Spark 中有哪些连接类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我查看了文档,它说支持以下连接类型:

I looked at the docs and it says the following join types are supported:

要执行的连接类型.默认内部.必须是以下之一:内部、交叉、外,全,full_outer,左,left_outer,右,right_outer,left_semi,left_anti.

我查看了关于 SQL 连接和顶部的 StackOverflow 答案几个答案没有提到上面的一些连接,例如left_semileft_anti.它们在 Spark 中是什么意思?

I looked at the StackOverflow answer on SQL joins and top couple of answers do not mention some of the joins from above e.g. left_semi and left_anti. What do they mean in Spark?

推荐答案

这是一个简单的说明性实验:

Here is a simple illustrative experiment:

import org.apache.spark.sql._

object SparkSandbox extends App {
  implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()
  import spark.implicits._
  spark.sparkContext.setLogLevel("ERROR")

  val left = Seq((1, "A1"), (2, "A2"), (3, "A3"), (4, "A4")).toDF("id", "value")
  val right = Seq((3, "A3"), (4, "A4"), (4, "A4_1"), (5, "A5"), (6, "A6")).toDF("id", "value")

  println("LEFT")
  left.orderBy("id").show()

  println("RIGHT")
  right.orderBy("id").show()

  val joinTypes = Seq("inner", "outer", "full", "full_outer", "left", "left_outer", "right", "right_outer", "left_semi", "left_anti")

  joinTypes foreach { joinType =>
    println(s"${joinType.toUpperCase()} JOIN")
    left.join(right = right, usingColumns = Seq("id"), joinType = joinType).orderBy("id").show()
  }
}

输出

LEFT
+---+-----+
| id|value|
+---+-----+
|  1|   A1|
|  2|   A2|
|  3|   A3|
|  4|   A4|
+---+-----+

RIGHT
+---+-----+
| id|value|
+---+-----+
|  3|   A3|
|  4|   A4|
|  4| A4_1|
|  5|   A5|
|  6|   A6|
+---+-----+

INNER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
+---+-----+-----+

OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4|   A4|
|  4|   A4| A4_1|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

FULL JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4|   A4|
|  4|   A4| A4_1|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

FULL_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4|   A4|
|  4|   A4| A4_1|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

LEFT JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
+---+-----+-----+

LEFT_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  1|   A1| null|
|  2|   A2| null|
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
+---+-----+-----+

RIGHT JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  3|   A3|   A3|
|  4|   A4| A4_1|
|  4|   A4|   A4|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

RIGHT_OUTER JOIN
+---+-----+-----+
| id|value|value|
+---+-----+-----+
|  3|   A3|   A3|
|  4|   A4|   A4|
|  4|   A4| A4_1|
|  5| null|   A5|
|  6| null|   A6|
+---+-----+-----+

LEFT_SEMI JOIN
+---+-----+
| id|value|
+---+-----+
|  3|   A3|
|  4|   A4|
+---+-----+

LEFT_ANTI JOIN
+---+-----+
| id|value|
+---+-----+
|  1|   A1|
|  2|   A2|
+---+-----+

这篇关于Spark 中有哪些连接类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-24 13:22