我遇到了一种情况,我从LEFT JOIN获得了重复的值。我认为这可能是理想的行为,但与我想要的有所不同。

我有三个表: person department contact

人:

id bigint,
person_name character varying(255)

部门:
person_id bigint,
department_name character varying(255)

联系人:
person_id bigint,
phone_number character varying(255)

SQL查询:
SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM person p
  LEFT JOIN department d
    ON p.id = d.person_id
  LEFT JOIN contact c
    ON p.id = c.person_id;

结果:
id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Finance"      |"99478"
1 |"John"     |"Finance"      |"67890"
1 |"John"     |"Marketing"    |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |"Marketing"    |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

我知道这就是联接的作用,不断与选定的行相乘。但这给人一种感觉,就像电话号码0234519947867890都适用于两个部门,而它们仅与john个人相关,具有不必要的重复值,这将放大更大数据集的问题。
所以,这就是我想要的:
id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |""             |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

这是我的情况的一个示例,并且我使用了大量的表和查询。因此,需要一种通用的解决方案。

最佳答案

我喜欢将此问题称为“代理交叉加入” 。由于没有信息(WHEREJOIN条件),表departmentcontact应该如何匹配,因此它们通过代理表person交叉连接-为您提供Cartesian product。与此非常相似:

  • Two SQL LEFT JOINS produce incorrect result

  • 那里有更多解释。

    您查询的解决方案:
    SELECT p.id, p.person_name, d.department_name, c.phone_number
    FROM   person p
    LEFT   JOIN (
       SELECT person_id, min(department_name) AS department_name
       FROM   department
       GROUP  BY person_id
       ) d ON d.person_id = p.id
    LEFT   JOIN (
       SELECT person_id, min(phone_number) AS phone_number
       FROM   contact
       GROUP  BY person_id
       ) c ON c.person_id = p.id;
    

    您没有定义要选择的部门或电话号码,因此我任意选择了最小的电话。您可以通过其他任何方式使用它...

    10-07 13:31
    查看更多