如何在某些匹配条件下

如何在某些匹配条件下

本文介绍了如何在某些匹配条件下 LEFT ANTI 加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表 - 一个是带有一对 ID(PC1 和 P2)和一些 blob 数据(P3)的核心数据.另一个是前表中PC1的黑名单数据.我将调用第一个表 in_df 和第二个 blacklist_df.

I have two tables - one is a core data with a pair of IDs (PC1 and P2) and some blob data (P3). The other is a blacklist data for PC1 in the former table. I will call the first table in_df and the second blacklist_df.

我想要做的是从 in_df 中删除行,只要 in_df.PC1 == blacklist_df.P1 和 in_df.P2 == black_list_df.B1.这是一个代码片段,用于更明确地展示我想要实现的目标.

What I want to do is to remove rows from in_df long as in_df.PC1 == blacklist_df.P1 and in_df.P2 == black_list_df.B1. Here is a code snippet to show what I want to achieve more explicitly.

in_df = sqlContext.createDataFrame([[1,2,'A'],[2,1,'B'],[3,1,'C'],
[4,11,'D'],[1,3,'D']],['PC1','P2','P3'])
in_df.show()

+---+---+---+
|PC1| P2| P3|
+---+---+---+
|  1|  2|  A|
|  2|  1|  B|
|  3|  1|  C|
|  4| 11|  D|
|  1|  3|  D|
+---+---+---+

blacklist_df = sqlContext.createDataFrame([[1,2],[2,1]],['P1','B1'])
blacklist_df.show()

+---+---+
| P1| B1|
+---+---+
|  1|  2|
|  2|  1|
+---+---+

最后我想得到的是以下内容:

In the end what I want to get is the followings:

+---+--+--+
|PC1|P2|P3|
+---+--+--+
|  1| 3| D|
|  3| 1| C|
|  4|11| D|
+---+--+--+

我尝试过 LEFT_ANTI 加入,但没有成功.

I tried LEFT_ANTI join but I haven't been successful.

推荐答案

将join条件作为列表传递给join函数,并指定how='left_anti'作为连接类型:

Pass the join conditions as a list to the join function, and specify how='left_anti' as the join type:

in_df.join(
    blacklist_df,
    [in_df.PC1 == blacklist_df.P1, in_df.P2 == blacklist_df.B1],
    how='left_anti'
).show()

+---+---+---+
|PC1| P2| P3|
+---+---+---+
|  1|  3|  D|
|  4| 11|  D|
|  3|  1|  C|
+---+---+---+

这篇关于如何在某些匹配条件下 LEFT ANTI 加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-02 02:47