加快PostgreSQL查询（检查另一个表中是否存在条目）

本文介绍了加快PostgreSQL查询（检查另一个表中是否存在条目）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一些帮助来提高查询速度。

I need some help to improve the speed of a query.

我有3个表：

1- pairTable2一个4列的表：

-genomic_accession：分组列（不在乎这个问题）

-汇编：分组列（不在乎这个问题）

- product_accession：用于在其他表中搜索的列

-tmpcol：用于在其他表中搜索的列

2- SBPDB一个1列的表：

-product_accession ：用于搜索其他表的列

3- cacheDB一个1列的表：
-product_accession：用于搜索其他表的列

I have 3 tables:
1- pairTable2 a 4 columns table:
- genomic_accession: grouping column (don't care for this question)
- assembly: grouping column (don't care for this question)
- product_accession: column used to search in other table
- tmpcol: column used to search in other table
2- SBPDB an 1 column table:
- product_accession: column used to search in other table
3- cacheDB an 1 column table: - product_accession: column used to search in other table

这个想法是在表1中创建一个名为SBP的布尔列，如果
列中的值 product_accession TRUE 。 c $ c>和/或 tmpcol 在 SBPDB 的唯一列中；

并且，在表1中创建一个名为SBP的布尔列，如果
列中的值 produc，则该列为 TRUE t_accession 和/或 tmpcol 在 cacheDB 的唯一列中。

The idea is to create a boolean column called SBP in table 1 which is TRUE ifthe value on columns product_accession and/or tmpcol is within the unique column in SBPDB;
And, create a boolean column called SBP in table 1 which is TRUE ifthe value on columns product_accession and/or tmpcol is within the unique column in cacheDB.

我正在将R与 DBI 和 dplyr 一起使用作为后端，则查询可能看起来很奇怪。
但是，我要执行的查询是：

I'm using R with DBI and dplyr as backend, then the query may look weirdo.But, the query I'm trying to do is:

SELECT "genomic_accession",
       "assembly",
       "product_accession",
       "tmpcol",
       "product_accession" IN (SELECT product_accession
                               FROM   "cachedb")
        OR "tmpcol" IN (SELECT product_accession
                        FROM   "cachedb") AS "CACHE",
       "product_accession" IN (SELECT product_accession
                               FROM   "sbpdb")
        OR "tmpcol" IN (SELECT product_accession
                        FROM   "sbpdb")   AS "SBP"
FROM   (SELECT *
        FROM   "pairtable2"
        LIMIT  500000) "dbplyr_031";

（检查解释）

                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Subquery Scan on dbplyr_031  (cost=3242.27..3846856408.45 rows=500000 width=59)
   ->  Limit  (cost=0.00..10666.17 rows=500000 width=57)
         ->  Seq Scan on "pairTable2"  (cost=0.00..781515.16 rows=36635216 width=57)
   SubPlan 1
     ->  Seq Scan on "cacheDB"  (cost=0.00..1394.91 rows=90491 width=14)
   SubPlan 2
     ->  Seq Scan on "cacheDB" "cacheDB_1"  (cost=0.00..1394.91 rows=90491 width=14)
   SubPlan 3
     ->  Materialize  (cost=0.00..7001.57 rows=276838 width=14)
           ->  Seq Scan on "SBPDB"  (cost=0.00..4265.38 rows=276838 width=14)
   SubPlan 4
     ->  Materialize  (cost=0.00..7001.57 rows=276838 width=14)
           ->  Seq Scan on "SBPDB" "SBPDB_1"  (cost=0.00..4265.38 rows=276838 width=14)
(13 rows)

因此，这只是50万行的示例，并且在运行1小时后仍然如此。
行的总数是：

So, this is just a sample of 500k rows, and after 1 hour still running.The total of rows is:

genomes=> select count(*) from "pairTable2";
  count
----------
 36633962
(1 row)

我至少需要一些建议来找出更好的查询来加快我的需求。

I need at least some suggestions to figure out a better query to speed up my needs.

表格示例：

（1）

Example of the tables:
(1)

 genomic_accession |    assembly     | product_accession |     tmpcol
-------------------+-----------------+-------------------+----------------
 NC_007777.1       | GCF_000013345.1 | WP_011437108.1    | WP_011437109.1
 NC_007777.1       | GCF_000013345.1 | WP_011437109.1    | WP_011437110.1
 NC_007777.1       | GCF_000013345.1 | WP_011437110.1    | WP_011437113.1
 NC_007777.1       | GCF_000013345.1 | WP_011437113.1    | WP_011437114.1
 NC_007777.1       | GCF_000013345.1 | WP_011437114.1    | WP_011437116.1
 NC_007777.1       | GCF_000013345.1 | WP_011437116.1    | WP_011437117.1
 NC_007777.1       | GCF_000013345.1 | WP_011437117.1    | WP_011437118.1
 NC_007777.1       | GCF_000013345.1 | WP_011437118.1    | WP_011437120.1
 NC_007777.1       | GCF_000013345.1 | WP_011437120.1    | WP_011437121.1
 NC_007777.1       | GCF_000013345.1 | WP_011437121.1    | WP_011437123.1
(10 rows)

（2）

 product_accession
-------------------
 WP_005887071.1
 WP_005913801.1
 WP_002804432.1
 WP_010366489.1
 WP_012444785.1
 NP_636898.1
 WP_046342269.1
 WP_074057745.1
 WP_039420813.1
 WP_005932253.1
(10 rows)

（3 ）

 product_accession
-------------------
 ABG85315.1
 ABG85570.1
 ABG86033.1
 ABG86301.1
 ABG87594.1
 ACX82524.1
 ACX83274.1
 ACX83416.1
 ADX79866.1
 ADX79880.1
(10 rows)

输出示例：

 genomic_accession |    assembly     | product_accession |     tmpcol     | CACHE | SBP
-------------------+-----------------+-------------------+----------------+-------+-----
 NC_007899.1       | GCF_000009945.1 | WP_011457581.1    | WP_011457582.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457582.1    | WP_011457583.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457583.1    | WP_011457584.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457584.1    | WP_011457585.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457585.1    | WP_011457586.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457586.1    | WP_011457587.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457587.1    | WP_011457588.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457588.1    | WP_011457589.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457589.1    | WP_011457590.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457590.1    | WP_011457592.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457592.1    | WP_011457593.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457593.1    | WP_011457594.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457594.1    | WP_011457596.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457596.1    | WP_011457597.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457597.1    | WP_011457598.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457598.1    | WP_011457600.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457600.1    | WP_011457601.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457601.1    | WP_011457602.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457602.1    | WP_011457603.1 | f     | f
 NC_007899.1       | GCF_000009945.1 | WP_011457603.1    | WP_011457604.1 | f     | f

预先感谢

加快PostgreSQL查询

加快PostgreSQL查询（检查另一个表中是否存在条目）

问题描述

推荐答案