问题描述
我正在建模(在Postgres 9.6.1/postGIS 2.3.1中)供供应商提供的本地服务的预订系统:
I am modeling (in Postgres 9.6.1 / postGIS 2.3.1) a booking system for local services provided by suppliers:
create table supplier (
id serial primary key,
name text not null check (char_length(title) < 280),
type service_type,
duration interval,
...
geo_position geography(POINT,4326)
...
);
每个供应商都可以预订一个带有时间段的日历:
Each supplier keeps a calendar with time slots when he/she is available to be booked:
create table timeslot (
id serial primary key,
supplier_id integer not null references supplier(id),
slot tstzrange not null,
constraint supplier_overlapping_timeslot_not_allowed
exclude using gist (supplier_id with =, slot with &&)
);
当客户想知道某个时间可以预订哪些附近的供应商时,我创建一个视图和一个功能:
For when a client wants to know which nearby suppliers are available to book at a certain time, I create a view and a function:
create view supplier_slots as
select
supplier.name, supplier.type, supplier.geo_position, supplier.duration, ...
timeslot.slot
from
supplier, timeslot
where
supplier.id = timeslot.supplier_id;
create function find_suppliers(wantedType service_type, near_latitude text, near_longitude text, at_time timestamptz)
returns setof supplier_slots as $$
declare
nearpoint geography;
begin
nearpoint := ST_GeographyFromText('SRID=4326;POINT(' || near_latitude || ' ' || near_longitude || ')');
return query
select * from supplier_slots
where type = wantedType
and tstzrange(at_time, at_time + duration) <@ slot
order by ST_Distance( nearpoint, geo_position )
limit 100;
end;
$$ language plpgsql;
所有这些都很好.
现在,对于在请求的时间没有可预订时段的供应商,我想找到他们在请求的at_time
之前和之后的最近可用时段,并按以下顺序排序距离.
Now, for the suppliers that did NOT have a bookable time slot at the requested time, I would like to find their closest available timeslots, before and after the requested at_time
, also sorted by distance.
这让我有点犹豫,我找不到合适的运算符来给我最近的tsrange.
This has my mind spinning a little bit and I can't find any suitable operators to give me the nearest tsrange.
有什么想法最聪明的方法吗?
Any ideas on the smartest way to do this?
推荐答案
解决方案取决于您想要的确切定义.
The solution depends on the exact definition of what you want.
我建议使用这些经过稍微修改的表定义,以使任务更简单,强制执行完整性并提高性能:
I suggest these slightly adapted table definitions to make the task simpler, enforce integrity and improve performance:
CREATE TABLE supplier (
supplier_id serial PRIMARY KEY,
supplier text NOT NULL CHECK (length(title) < 280),
type service_type,
duration interval,
geo_position geography(POINT,4326)
);
CREATE TABLE timeslot (
timeslot_id serial PRIMARY KEY,
supplier_id integer NOT NULL -- references supplier(id),
slot_a timestamptz NOT NULL,
slot_z timestamptz NOT NULL,
CONSTRAINT timeslot_range_valid CHECK (slot_a < slot_z)
CONSTRAINT timeslot_no_overlapping
EXCLUDE USING gist (supplier_id WITH =, tstzrange(slot_a, slot_z) WITH &&)
);
CREATE INDEX timeslot_slot_z ON timeslot (supplier_id, slot_z);
CREATE INDEX supplier_geo_position_gist ON supplier USING gist (geo_position);
-
保存两个
timestamptz
列slot_a
和slot_z
而不是tstzrange
列slot
-并相应地调整约束.现在,这会将所有范围自动视为默认的 inclusive 下限和 exclusive 上限-这样可以避免出现极端情况错误/头痛.Save two
timestamptz
columnsslot_a
andslot_z
instead of thetstzrange
columnslot
- and adapt constraints accordingly. This treats all ranges as default inclusive lower and exclusive upper bounds automatically now - which avoids corner case errors / headache.附带好处:2个
timestamptz
仅16个字节,而不是tstzrange
25个字节(带填充的32个字节).Collateral benefit: only 16 bytes for 2
timestamptz
instead of 25 bytes (32 with padding) for thetstzrange
.您可能在
slot
上进行过的所有查询都将继续使用tstzrange(slot_a, slot_z)
作为替代品.All queries you might have had on
slot
keep working withtstzrange(slot_a, slot_z)
as drop-in replacement.在
(supplier_id, slot_z)
上为当前查询添加索引.
还有supplier.geo_position
上的空间索引(您可能已经有).Add an index on
(supplier_id, slot_z)
for the query at hand.
And a spatial index onsupplier.geo_position
(which you probably have already).根据
type
中的数据分布,针对查询中常见类型的几个部分索引可能会提高性能:Depending on data distribution in
type
, a couple of partial indexes for types common in queries might help performance:CREATE INDEX supplier_geo_type_foo_gist ON supplier USING gist (geo_position) WHERE supplier = 'foo'::service_type;
此查询查找提供正确
service_type
(示例中为100)的 X个最接近的供应商,每个供应商都具有一个最匹配的时隙(由到广告位开始的时间距离).我将其与实际匹配的插槽结合在一起,可能不需要,也可能不需要.This query finds the X closest suppliers who offer the correct
service_type
(100 in the example), each with the one closest matching time slot (defined by the time distance to the start of the slot). I combined this with actually matching slots, which may or may not be what you need.CREATE FUNCTION f_suppliers_nearby(_type service_type, _lat text, _lon text, at_time timestamptz) RETURNS TABLE (supplier_id int , name text , duration interval , geo_position geography(POINT,4326) , distance float , timeslot_id int , slot_a timestamptz , slot_z timestamptz , time_dist interval ) AS $func$ WITH sup_nearby AS ( -- find matching or later slot SELECT s.id, s.name, s.duration, s.geo_position , ST_Distance(ST_GeographyFromText('SRID=4326;POINT(' || _lat || ' ' || _lon || ')') , geo_position) AS distance , t.timeslot_id, t.slot_a, t.slot_z , CASE WHEN t.slot_a IS NOT NULL THEN GREATEST(t.slot_a - at_time, interval '0') END AS time_dist FROM supplier s LEFT JOIN LATERAL ( SELECT * FROM timeslot WHERE supplier_id = supplier_id AND slot_z > at_time + s.duration -- excl. upper bound ORDER BY slot_z LIMIT 1 ) t ON true WHERE s.type = _type ORDER BY s.distance LIMIT 100 ) SELECT * FROM ( SELECT DISTINCT ON (supplier_id) * -- 1 slot per supplier FROM ( TABLE sup_nearby -- matching or later slot UNION ALL -- earlier slot SELECT s.id, s.name, s.duration, s.geo_position , s.distance , t.timeslot_id, t.slot_a, t.slot_z , GREATEST(at_time - t.slot_a, interval '0') AS time_dist FROM sup_nearby s CROSS JOIN LATERAL ( -- this time CROSS JOIN! SELECT * FROM timeslot WHERE supplier_id = s.supplier_id AND slot_z <= at_time -- excl. upper bound ORDER BY slot_z DESC LIMIT 1 ) t WHERE s.time_dist IS DISTINCT FROM interval '0' -- exact matches are done ) sub ORDER BY supplier_id, time_dist -- pick temporally closest slot per supplier ) sub ORDER BY time_dist, distance; -- matches first, ordered by distance; then misses, ordered by time distance $func$ LANGUAGE sql;
我没有使用您的视图
supplier_slots
,而是针对性能进行了优化.该视图可能仍然很方便.您可能包括tstzrange(slot_a, slot_z) AS slot
以便向后兼容.I did not use your view
supplier_slots
and optimized for performance instead. The view may still be convenient. You might includetstzrange(slot_a, slot_z) AS slot
for backward compatibility.查找100个最接近的供应商的基本查询是教科书"K最近的邻居"问题. GiST索引对此非常有效.相关:
The basic query to find the 100 closest suppliers is a textbook "K Nearest Neighbour" problem. A GiST index works well for this. Related:
可以将附加任务(查找时间上最近的时隙)分为两个任务:查找下一个较高的行和下一个较低的行.该解决方案的核心功能是具有两个子查询,其中分别包含
ORDER BY slot_z LIMIT 1
和ORDER BY slot_z DESC LIMIT 1
,这将导致两个非常快速的索引扫描.The additional task (find the temporally nearest slot) can be split in two tasks: to find the next higher and the next lower row. The core feature of the solution is to have two subqueries with
ORDER BY slot_z LIMIT 1
andORDER BY slot_z DESC LIMIT 1
, which result in two very fast index scans.我将第一个与找到实际匹配项结合在一起,这是(我认为是聪明的)优化,但可能会偏离实际解决方案.
I combined the first one with finding actual matches, which is a (smart, I think) optimization, but may distract from the actual solution.
这篇关于Postgres:如何从范围之外的时间戳中找到最接近的tsrange?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!