问题描述
我正在尝试使用 ST_SnapToGrid ,然后使用GROUP BY
网格单元(x, y).这是我首先要做的:
I'm trying to use ST_SnapToGrid and then GROUP BY
the grid cells (x, y). Here is what I did first:
SELECT
COUNT(*) AS n,
ST_X(ST_SnapToGrid(geom, 50)) AS x,
ST_Y(ST_SnapToGrid(geom, 50)) AS y
FROM points
GROUP BY x, y
我不想为x
和y
都重新计算ST_SnapToGrid
.所以我将其更改为使用子查询:
I don't want to recompute ST_SnapToGrid
for both x
and y
. So I changed it to use a sub-query:
SELECT
COUNT(*) AS n,
ST_X(geom) AS x,
ST_Y(geom) AS y
FROM (
SELECT
ST_SnapToGrid(geom, 50) AS geom
FROM points
) AS tmp
GROUP BY x, y
但是当我运行EXPLAIN
时,这两个查询都具有完全相同的执行计划:
But when I run EXPLAIN
, both of these queries have the exact same execution plan:
GroupAggregate (...)
-> Sort (...)
Sort Key: (st_x(st_snaptogrid(points.geom, 0::double precision))), (st_y(st_snaptogrid(points.geom, 0::double precision)))
-> Seq Scan on points (...)
问题:PostgreSQL是否会重用ST_SnapToGrid()
的结果值?
如果没有,是否有办法做到这一点?
Question: Will PostgreSQL reuse the result value of ST_SnapToGrid()
?
If not, is there a way to make it do this?
推荐答案
测试时间
在EXPLAIN
输出中看不到每行各个功能的评估.
Test timing
You don't see the evaluation of individual functions per row in the EXPLAIN
output.
使用EXPLAIN ANALYZE
测试以获取实际查询时间以比较总体效果.运行几次以排除缓存工件.对于像这样的简单查询,您可以通过以下方式获得整个运行时的更可靠的数字:
Test with EXPLAIN ANALYZE
to get actual query times to compare overall effectiveness. Run a couple of times to rule out caching artifacts. For simple queries like this, you get more reliable numbers for the total runtime with:
EXPLAIN (ANALYZE, TIMING OFF) SELECT ...
需要Postgres 9.2 + . 每个文档:
Requires Postgres 9.2+. Per documentation:
在输出中包括实际的启动时间和在每个节点上花费的时间.重复读取系统时钟的开销可能会减慢 在某些系统上大大降低了查询量,因此对于 仅当实际行计数而不精确时,将此参数设置为FALSE
时间,是必要的.整个语句的运行时间始终是 即使使用此选项关闭了节点级时序,也可以进行测量. 仅当还启用了ANALYZE
时,才可以使用此参数.它 默认为TRUE
.
Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to FALSE
when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. This parameter may only be used when ANALYZE
is also enabled. It defaults to TRUE
.
防止重复评估
通常,子查询中的表达式一次评估一次.但是,如果Postgres认为这样做会更快,那么它可以使琐碎的子查询崩溃.
Prevent repeated evaluation
Normally, expressions in a subquery are evaluated once. But Postgres can collapse trivial subqueries if it thinks that will be faster.
要引入优化障碍,您可以使用 CTE 而不是子查询. Postgres仅计算一次ST_SnapToGrid(geom, 50)
的保证:
To introduce an optimization barrier, you could use a CTE instead of the subquery. This guarantees that Postgres computes ST_SnapToGrid(geom, 50)
once only:
WITH cte AS (
SELECT ST_SnapToGrid(geom, 50) AS geom1
FROM points
)
SELECT COUNT(*) AS n
, ST_X(geom1) AS x
, ST_Y(geom1) AS y
FROM cte
GROUP BY geom1; -- see below
但是,由于CTE的开销较大,因此它可能比子查询慢 .函数调用可能非常便宜.通常,Postgres会更好地了解如何优化查询计划.如果您更了解,请仅引入这样的优化障碍.
However, this it's probably slower than a subquery due to more overhead for a CTE. The function call is probably very cheap. Generally, Postgres knows better how to optimize a query plan. Only introduce such an optimization barrier if you know better.
我将子查询/CTE中计算点的名称更改为geom1
,以阐明其与原始geom
的不同.这有助于在此处阐明更重要的内容:
I changed the name of the computed point in the subquery / CTE to geom1
to clarify it's different from the original geom
. That helps to clarify the more important thing here:
GROUP BY geom1
代替:
这显然更便宜-可能会对是否重复调用函数产生影响.因此,这可能是最快的:
That's obviously cheaper - and may have an influence on whether the function call is repeated. So, this is probably fastest:
SELECT COUNT(*) AS n
, ST_X(ST_SnapToGrid(geom, 50)) AS x
, ST_y(ST_SnapToGrid(geom, 50)) AS y
FROM points
GROUP BY ST_SnapToGrid(geom, 50); -- same here!
或者也许是这样
SELECT COUNT(*) AS n
, ST_X(geom1) AS x
, ST_y(geom1) AS y
FROM (
SELECT ST_SnapToGrid(geom, 50) AS geom1
FROM points
) AS tmp
GROUP BY geom1;
使用EXPLAIN ANALYZE
或EXPLAIN (ANALYZE, TIMING OFF)
测试所有三个,然后亲自查看.测试>>猜测.
Test all three with EXPLAIN ANALYZE
or EXPLAIN (ANALYZE, TIMING OFF)
and see for yourself. Testing >> guessing.
这篇关于重用计算出的选择值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!