在PostgreSQL 9.5中,我有一个包含67000条记录的表:
# \d words_nouns
Table "public.words_nouns"
Column | Type | Modifiers
---------+--------------------------+-----------
word | text | not null
hashed | text | not null
added | timestamp with time zone |
removed | timestamp with time zone |
Indexes:
"words_nouns_pkey" PRIMARY KEY, btree (word)
Check constraints:
"words_nouns_word_check" CHECK (word ~ '^[A-Z]{2,}$'::text)
还有一个类似的表格有36000条记录。
最好定义以下自定义函数:
CREATE OR REPLACE FUNCTION words_get_added(
in_visited integer,
OUT out_json jsonb
) RETURNS jsonb AS
$func$
DECLARE
_added text[];
BEGIN
-- create array with words added to dictionary since in_visited timestamp
IF in_visited > 0 THEN
_added := (
SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE EXTRACT(EPOCH FROM added) > in_visited
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE EXTRACT(EPOCH FROM added) > in_visited
);
IF CARDINALITY(_added) > 0 THEN
out_json := jsonb_build_object('added', _added);
END IF;
END IF;
END
$func$ LANGUAGE plpgsql;
或者我应该更好地将
words_verbs
转换为带时区的时间戳并与之进行比较:CREATE OR REPLACE FUNCTION words_get_added(
in_visited integer,
OUT out_json jsonb
) RETURNS jsonb AS
$func$
DECLARE
_added text[];
BEGIN
-- create array with words added to dictionary since in_visited timestamp
IF in_visited > 0 THEN
_added := (
SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE added > TO_TIMESTAMP(in_visited)
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE added > TO_TIMESTAMP(in_visited)
);
IF CARDINALITY(_added) > 0 THEN
out_json := jsonb_build_object('added', _added);
END IF;
END IF;
END
$func$ LANGUAGE plpgsql;
以下是2个
in_visited
输出,但我不确定如何解释它们:# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE EXTRACT(EPOCH FROM added) > 0
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE EXTRACT(EPOCH FROM added) > 0;
QUERY PLAN
---------------------------------------------------------------------------------------------
Unique (cost=2707.03..2707.04 rows=2 width=32)
-> Sort (cost=2707.03..2707.03 rows=2 width=32)
Sort Key: (array_agg(words_nouns.hashed))
-> Append (cost=1740.53..2707.02 rows=2 width=32)
-> Aggregate (cost=1740.53..1740.54 rows=1 width=32)
-> Seq Scan on words_nouns (cost=0.00..1684.66 rows=22348 width=32)
Filter: (date_part('epoch'::text, added) > '0'::double precision)
-> Aggregate (cost=966.45..966.46 rows=1 width=32)
-> Seq Scan on words_verbs (cost=0.00..936.05 rows=12157 width=32)
Filter: (date_part('epoch'::text, added) > '0'::double precision)
(10 rows)
# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE added > to_timestamp(0)
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE added > to_timestamp(0);
QUERY PLAN
------------------------------------------------------------------------------------------------
Unique (cost=2361.99..2362.00 rows=2 width=32)
-> Sort (cost=2361.99..2361.99 rows=2 width=32)
Sort Key: (array_agg(words_nouns.hashed))
-> Append (cost=1517.06..2361.98 rows=2 width=32)
-> Aggregate (cost=1517.06..1517.07 rows=1 width=32)
-> Seq Scan on words_nouns (cost=0.00..1517.05 rows=1 width=32)
Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
-> Aggregate (cost=844.88..844.89 rows=1 width=32)
-> Seq Scan on words_verbs (cost=0.00..844.88 rows=1 width=32)
Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
(10 rows)
问题是:这两个存储函数中哪一个性能更好,或者它们没有区别?
最佳答案
性能的关键因素是与查询匹配的索引。通常,您会在普通列added
上有一个索引,该索引应该与同一类型的输入参数匹配,其中索引是适用的。
对于手头的任务,将added
上的普通索引与第二个函数(added > TO_TIMESTAMP(in_visited)
)或主题上的变体组合起来。函数转换发生在值与列比较之前,因此表达式是“可搜索的”。
对于顶级读取性能,您可能在added
上有一个多列索引,并保持表真空,以便只允许索引扫描。。。
相关:
Can PostgreSQL index array columns?
How to speed up the query in PostgreSQL
Is a composite index also good for queries on the first field?