我所拥有的:
data_source_1表格
data_source_2表格
data_sources_view视图
关于表:
data_source_1
没有重复:

db=# select count(*) from (select distinct * from data_source_1);
count
--------
543243
(1 row)

db=# select count(*) from (select * from data_source_1);
count
--------
543243
(1 row)

data_source_2
没有重复:
db=# select count(*) from (select * from data_source_2);
count
-------
5304
(1 row)

db=# select count(*) from (select distinct * from data_source_2);
count
-------
5304
(1 row)

data_sources_view
有重复:
db=# select count(*) from (select distinct * from data_sources_vie);
count
--------
538714
(1 row)

db=# select count(*) from (select * from data_sources_view);
count
--------
548547
(1 row)

视图很简单:
CREATE VIEW data_sources_view
AS SELECT *
FROM (
      (
       SELECT a, b, 'data_source_1' as source
       FROM data_source_1
      )
      UNION ALL
      (
       SELECT a, b, 'data_source_2' as source
       FROM data_source_2
      )
);

我想知道的是:
在源表没有dup+'data_source_x' as source的视图中,如何可能有dup消除了重叠数据的可能性。
如何识别dup?
我试过的:
db# create table t1 as select * from data_sources_view;
SELECT
db=#
db=# create table t2 as select distinct * from data_sources_view;
SELECT
db=# create table t3 as select * from t1 minus select * from t2;
SELECT
db=# select 't1' as table_name, count(*) from t1 UNION ALL
db-# select 't2' as table_name, count(*) from t2 UNION ALL
db-# select 't3' as table_name, count(*) from t3;
table_name | count
------------+--------
t1 | 548547
t3 | 0
t2 | 538714
(3 rows)

数据库:
红移(PostgreSQL

最佳答案

原因是数据源有两列以上。如果你这么做了:

select count(*) from (select distinct a, b from data_source_1);


select count(*) from (select distinct a, b from data_source_2);

你应该发现它们不同于同一张桌子上的count(*)

关于sql - 需要帮助识别表中的公仔,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/29708621/

10-10 00:41