问题描述
我正在尝试在 spark (1.6.2) 中进行左外连接,但它不起作用.我的sql查询是这样的:
I am trying to do a left outer join in spark (1.6.2) and it doesn't work. My sql query is like this:
sqlContext.sql("select t.type, t.uuid, p.uuid
from symptom_type t LEFT JOIN plugin p
ON t.uuid = p.uuid
where t.created_year = 2016
and p.created_year = 2016").show()
结果是这样的:
+--------------------+--------------------+--------------------+
| type| uuid| uuid|
+--------------------+--------------------+--------------------+
| tained|89759dcc-50c0-490...|89759dcc-50c0-490...|
| swapper|740cd0d4-53ee-438...|740cd0d4-53ee-438...|
我使用 LEFT JOIN 或 LEFT OUTER JOIN 得到了相同的结果(第二个 uuid 不为空).
I got same result either using LEFT JOIN or LEFT OUTER JOIN (the second uuid is not null).
我希望第二个 uuid 列只能为空.如何正确进行左外连接?
I would expect the second uuid column to be null only. how to do a left outer join correctly?
=== 附加信息 ==
=== Additional information ==
如果我使用数据框进行左外连接,我得到了正确的结果.
If I using dataframe to do left outer join i got correct result.
s = sqlCtx.sql('select * from symptom_type where created_year = 2016')
p = sqlCtx.sql('select * from plugin where created_year = 2016')
s.join(p, s.uuid == p.uuid, 'left_outer')
.select(s.type, s.uuid.alias('s_uuid'),
p.uuid.alias('p_uuid'), s.created_date, p.created_year, p.created_month).show()
我得到了这样的结果:
+-------------------+--------------------+-----------------+--------------------+------------+-------------+
| type| s_uuid| p_uuid| created_date|created_year|created_month|
+-------------------+--------------------+-----------------+--------------------+------------+-------------+
| tained|6d688688-96a4-341...| null|2016-01-28 00:27:...| null| null|
| tained|6d688688-96a4-341...| null|2016-01-28 00:27:...| null| null|
| tained|6d688688-96a4-341...| null|2016-01-28 00:27:...| null| null|
谢谢,
推荐答案
我在您的代码中没有发现任何问题.左连接"或左外连接"都可以正常工作.请再次检查数据,您显示的数据是匹配的.
I don't see any issues in your code. Both "left join" or "left outer join" will work fine. Please check the data again the data you are showing is for matches.
您还可以使用以下方法执行 Spark SQL 联接:
You can also perform Spark SQL join by using:
//显式左外连接
df1.join(df2, df1["col1"] == df2["col1"], "left_outer")
这篇关于如何在spark sql中进行左外连接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!