performance - 为什么搜索字词稍有变化会大大降低查询速度？

我在PostgreSQL（9.5.1）中有以下查询：

select e.id, (select count(id) from imgitem ii where ii.tabid = e.id and ii.tab = 'esp') as imgs,
 e.ano, e.mes, e.dia, cast(cast(e.ano as varchar(4))||'-'||right('0'||cast(e.mes as varchar(2)),2)||'-'|| right('0'||cast(e.dia as varchar(2)),2) as varchar(10)) as data,
 pl.pltag, e.inpa, e.det, d.ano anodet, coalesce(p.abrev,'')||' ('||coalesce(p.prenome,'')||')' determinador, d.tax, coalesce(v.val,v.valf)||' '||vu.unit as altura,
 coalesce(v1.val,v1.valf)||' '||vu1.unit as DAP, d.fam, tf.nome família, d.gen, tg.nome gênero, d.sp, ts.nome espécie, d.inf, e.loc, l.nome localidade, e.lat, e.lon
from esp e
left join det d on e.det = d.id
left join tax tf on d.fam = tf.oldfam
left join tax tg on d.gen = tg.oldgen
left join tax ts on d.sp = ts.oldsp
left join tax ti on d.inf = ti.oldinf
left join loc l on e.loc = l.id
left join pess p on p.id = d.detby
left join var v on v.esp = e.id and v.key = 265
left join varunit vu on vu.id = v.unit
left join var v1 on v1.esp = e.id and v1.key = 264
left join varunit vu1 on vu1.id = v1.unit
left join pl on pl.id = e.pl
WHERE unaccent(TEXT(coalesce(p.abrev,'')||' ('||coalesce(p.prenome,'')||')')) ilike unaccent('%vicen%')

从esp表中总共9250行检索1129行需要430ms。
如果我将搜索词从%vicen%更改为%vicent%（添加“t”），则检索相同的1129行需要431毫秒。
按搜索列（升序和降序）排序，我发现在这两种情况下，所有1129行的名称都完全相同。
现在奇怪的是：如果我将搜索词从%vicent%改为%vicenti%（添加一个“I”），那么现在检索相同的1129行需要难以置信的24.4秒！
搜索的词总是在第一个coalesce，即coalesce(p.abrev,'')。我希望查询运行得更慢或更快，这取决于搜索字符串的大小，但不会太多！！有人知道怎么回事吗？
EXPLAIN ANALYZE的结果（将超过此处的30k字符限制）：
对于%vicen%：http://explain.depesz.com/s/2XF
对于%vicenti%：http://explain.depesz.com/s/dEc6

最佳答案

为什么？
原因是：
快速查询：

->  Hash Left Join  (cost=1378.60..2467.48 rows=15 width=79) (actual time=41.759..85.037 rows=1129 loops=1)
      ...
      Filter: (unaccent(((((COALESCE(p.abrev, ''::character varying))::text || ' ('::text) || (COALESCE(p.prenome, ''::character varying))::text) || ')'::text)) ~~* (...)

Slow query:

->  Hash Left Join  (cost=1378.60..2467.48 rows=1 width=79) (actual time=35.084..80.209 rows=1129 loops=1)
      ...
      Filter: (unaccent(((((COALESCE(p.abrev, ''::character varying))::text || ' ('::text) || (COALESCE(p.prenome, ''::character varying))::text) || ')'::text)) ~~* unacc (...)

Extending the search pattern by another character causes Postgres to assume yet fewer hits. (Typically, this is a reasonable estimate.) Postgres obviously does not have precise enough statistics (none, actually, keep reading) to expect the same number of hits that you really get.

This causes a switch to a different query plan, which is even less optimal for the actual number of hits rows=1129.

Solution

Assuming current Postgres 9.5 since it has not been declared.

One way to improve the situation is to create an expression index on the expression in the predicate. This makes Postgres gather statistics for the actual expression, which can help the query even if the index itself is not used for the query. Without the index, there are no statistics for the expression at all. And if done right the index can be used for the query, that's even much better. But there are multiple problems with your current expression:

Consider this updated query, based on some assumptions about your undisclosed table definitions:

SELECT e.id
     , (SELECT count(*) FROM imgitem
        WHERE tabid = e.id AND tab = 'esp') AS imgs -- count(*) is faster
     , e.ano, e.mes, e.dia
     , e.ano::text || to_char(e.mes2, 'FM"-"00')
                   || to_char(e.dia,  'FM"-"00') AS data
     , pl.pltag, e.inpa, e.det, d.ano anodet
     , format('%s (%s)', p.abrev, p.prenome) AS determinador
     , d.tax
     , coalesce(v.val,v.valf)   || ' ' || vu.unit  AS altura
     , coalesce(v1.val,v1.valf) || ' ' || vu1.unit AS dap
     , d.fam, tf.nome família, d.gen, tg.nome AS gênero, d.sp
     , ts.nome AS espécie, d.inf, e.loc, l.nome localidade, e.lat, e.lon
FROM      pess    p                        -- reorder!
JOIN      det     d   ON d.detby   = p.id  -- INNER JOIN !
LEFT JOIN tax     tf  ON tf.oldfam = d.fam
LEFT JOIN tax     tg  ON tg.oldgen = d.gen
LEFT JOIN tax     ts  ON ts.oldsp  = d.sp
  -- unused, see @joop's comment
LEFT JOIN esp     e   ON e.det     = d.id
LEFT JOIN loc     l   ON l.id      = e.loc
LEFT JOIN var     v   ON v.esp     = e.id AND v.key  = 265
LEFT JOIN varunit vu  ON vu.id     = v.unit
LEFT JOIN var     v1  ON v1.esp    = e.id AND v1.key = 264
LEFT JOIN varunit vu1 ON vu1.id    = v1.unit
LEFT JOIN pl          ON pl.id     = e.pl
WHERE f_unaccent(p.abrev)   ILIKE f_unaccent('%' || 'vicenti' || '%') OR
      f_unaccent(p.prenome) ILIKE f_unaccent('%' || 'vicenti' || '%');

要点
为什么f_unaccent()？因为unaccent()不能被索引。请阅读：
Does PostgreSQL support "accent insensitive" collations?
我使用这里概述的函数来允许以下操作（推荐！）多列函数三元索引：

CREATE INDEX pess_unaccent_nome_trgm_idx ON pess
USING gin (f_unaccent(pess) gin_trgm_ops, f_unaccent(prenome) gin_trgm_ops);

如果您不熟悉三元索引，请先阅读以下内容：
PostgreSQL LIKE query performance variations
可能还有：
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
请确保运行最新版本的Postgres（当前为9.5）。金酒指数有了实质性的改善。你会对pg_trgm 1.2的改进感兴趣，它将与即将发布的Postgres 9.6一起发布：
Trigram search gets much slower as search string gets longer
准备好的语句是使用参数（尤其是使用用户输入的文本）执行查询的常用方法。Postgres必须找到一个最适合任何给定参数的计划。将通配符作为常量添加到搜索项中，如下所示：

f_unaccent(p.abrev) ILIKE f_unaccent('%' || 'vicenti' || '%')

（'vicenti'将被替换为一个参数。）因此Postgres知道我们正在处理一个既不是左锚也不是右锚的模式-这将允许不同的策略。更多细节的相关回答：
Performance impact of empty LIKE in a prepared statement
或者为每个搜索项重新规划查询（可能在函数中使用动态SQL）。但要确保计划的时间不会影响到任何可能的性能提升。
WHERE中列的pess条件与LEFT JOIN相矛盾。Postgres被迫将其转换为INNER JOIN。更糟糕的是，连接在连接树中来得很晚。而且由于Postgres无法重新排序您的连接（见下文），这可能会变得非常昂贵。将表移到FROM子句中的第一个位置，以便尽早删除行。下面的LEFT JOINs不会根据定义消除任何行。但是对于这么多的表，将可能使行成倍增加的连接移动到末尾是很重要的。
你要加入13个表格，其中12个表格有LEFT JOIN可能的组合，如果我们把12!考虑进去，那就是11! * 2!。对于Postgres来说，这太多了，无法为最佳查询计划计算所有可能的排列。阅读LEFT JOIN：
Sample Query to show Cardinality estimation error in PostgreSQL
SQL INNER JOIN over multiple tables equal to WHERE syntax
INNER JOIN的默认设置是8，这意味着Postgres不会尝试对join_collapse_limit子句中的表重新排序，并且表的顺序是相关的。
解决这个问题的一种方法是将性能关键部分拆分为CTE类的@joop commented。不要将join_collapse_limit设置得太高，否则涉及多个联接表的查询规划次数将降低。
关于名为FROM的连接日期：
join_collapse_limit
假设您从定义为data的年、月和日的三个数字列生成，请改用此：

e.ano::text || to_char(e.mes2, 'FM"-"00')
            || to_char(e.dia,  'FM"-"00') AS data

关于

cast(cast(e.ano as varchar(4))||'-'||right('0'||cast(e.mes as varchar(2)),2)||'-'|| right('0'||cast(e.dia as varchar(2)),2) as varchar(10)) as data

模板模式修改器：
Check for integer in string array
但实际上，首先应该将日期存储为数据类型NOT NULL。
也简化了：

format('%s (%s)', p.abrev, p.prenome) AS determinador

不会使查询更快，但会更干净。见FM。
首先，最后，所有关于性能优化的常规建议都适用：
Keep PostgreSQL from sometimes choosing a bad query plan
如果所有这些都做对了，那么您应该可以看到所有模式的更快查询。