如何从范围之外的时间戳中找到最接近的tsrange

如何从范围之外的时间戳中找到最接近的tsrange

本文介绍了Postgres:如何从范围之外的时间戳中找到最接近的tsrange?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建模(在Postgres 9.6.1/postGIS 2.3.1中)供供应商提供的本地服务的预订系统:

I am modeling (in Postgres 9.6.1 / postGIS 2.3.1) a booking system for local services provided by suppliers:

create table supplier (
    id                serial primary key,
    name              text not null check (char_length(title) < 280),
    type              service_type,
    duration          interval,
    ...
    geo_position      geography(POINT,4326)
    ...
);

每个供应商都可以预订一个带有时间段的日历:

Each supplier keeps a calendar with time slots when he/she is available to be booked:

create table timeslot (
    id                 serial primary key,
    supplier_id        integer not null references supplier(id),
    slot               tstzrange not null,

    constraint supplier_overlapping_timeslot_not_allowed
    exclude using gist (supplier_id with =, slot with &&)
);

当客户想知道某个时间可以预订哪些附近的供应商时,我创建一个视图和一个功能:

For when a client wants to know which nearby suppliers are available to book at a certain time, I create a view and a function:

create view supplier_slots as
    select
        supplier.name, supplier.type, supplier.geo_position, supplier.duration, ...
        timeslot.slot
    from
        supplier, timeslot
    where
        supplier.id = timeslot.supplier_id;


create function find_suppliers(wantedType service_type, near_latitude text, near_longitude text, at_time timestamptz)
returns setof supplier_slots as $$
declare
    nearpoint geography;
begin
    nearpoint := ST_GeographyFromText('SRID=4326;POINT(' || near_latitude || ' ' || near_longitude || ')');
    return query
        select * from supplier_slots
        where type = wantedType
            and tstzrange(at_time, at_time + duration) <@ slot
        order by ST_Distance( nearpoint, geo_position )
        limit 100;
end;
$$ language plpgsql;

所有这些都很好.

现在,对于在请求的时间没有可预订时段的供应商,我想找到他们在请求的at_time之前和之后的最近可用时段,并按以下顺序排序距离.

Now, for the suppliers that did NOT have a bookable time slot at the requested time, I would like to find their closest available timeslots, before and after the requested at_time, also sorted by distance.

这让我有点犹豫,我找不到合适的运算符来给我最近的tsrange.

This has my mind spinning a little bit and I can't find any suitable operators to give me the nearest tsrange.

有什么想法最聪明的方法吗?

Any ideas on the smartest way to do this?

推荐答案

解决方案取决于您想要的确切定义.

The solution depends on the exact definition of what you want.

我建议使用这些经过稍微修改的表定义,以使任务更简单,强制执行完整性并提高性能:

I suggest these slightly adapted table definitions to make the task simpler, enforce integrity and improve performance:

CREATE TABLE supplier (
   supplier_id  serial PRIMARY KEY,
   supplier     text NOT NULL CHECK (length(title) < 280),
   type         service_type,
   duration     interval,
   geo_position geography(POINT,4326)
);

CREATE TABLE timeslot (
   timeslot_id  serial PRIMARY KEY,
   supplier_id  integer NOT NULL -- references supplier(id),
   slot_a       timestamptz NOT NULL,
   slot_z       timestamptz NOT NULL,
   CONSTRAINT   timeslot_range_valid CHECK (slot_a < slot_z)
   CONSTRAINT   timeslot_no_overlapping
     EXCLUDE USING gist (supplier_id WITH =, tstzrange(slot_a, slot_z) WITH &&)
);

CREATE INDEX timeslot_slot_z ON timeslot (supplier_id, slot_z);
CREATE INDEX supplier_geo_position_gist ON supplier USING gist (geo_position);

  • 保存两个timestamptzslot_aslot_z而不是tstzrangeslot-并相应地调整约束.现在,这会将所有范围自动视为默认的 inclusive 下限和 exclusive 上限-这样可以避免出现极端情况错误/头痛.

    • Save two timestamptz columns slot_a and slot_z instead of the tstzrange column slot - and adapt constraints accordingly. This treats all ranges as default inclusive lower and exclusive upper bounds automatically now - which avoids corner case errors / headache.

      附带好处:2个timestamptz仅16个字节,而不是tstzrange 25个字节(带填充的32个字节).

      Collateral benefit: only 16 bytes for 2 timestamptz instead of 25 bytes (32 with padding) for the tstzrange.

      您可能在slot上进行过的所有查询都将继续使用tstzrange(slot_a, slot_z)作为替代品.

      All queries you might have had on slot keep working with tstzrange(slot_a, slot_z) as drop-in replacement.

      (supplier_id, slot_z)上为当前查询添加索引.
      还有supplier.geo_position上的空间索引(您可能已经有).

      Add an index on (supplier_id, slot_z) for the query at hand.
      And a spatial index on supplier.geo_position (which you probably have already).

      根据type中的数据分布,针对查询中常见类型的几个部分索引可能会提高性能:

      Depending on data distribution in type, a couple of partial indexes for types common in queries might help performance:

      CREATE INDEX supplier_geo_type_foo_gist ON supplier USING gist (geo_position)
      WHERE supplier = 'foo'::service_type;
      

    • 此查询查找提供正确service_type (示例中为100)的 X个最接近的供应商,每个供应商都具有一个最匹配的时隙(由到广告位开始的时间距离).我将其与实际匹配的插槽结合在一起,可能不需要,也可能不需要.

      This query finds the X closest suppliers who offer the correct service_type (100 in the example), each with the one closest matching time slot (defined by the time distance to the start of the slot). I combined this with actually matching slots, which may or may not be what you need.

      CREATE FUNCTION f_suppliers_nearby(_type service_type, _lat text, _lon text, at_time timestamptz)
        RETURNS TABLE (supplier_id  int
                     , name         text
                     , duration     interval
                     , geo_position geography(POINT,4326)
                     , distance     float
                     , timeslot_id  int
                     , slot_a       timestamptz
                     , slot_z       timestamptz
                     , time_dist    interval
         ) AS
      $func$
         WITH sup_nearby AS (  -- find matching or later slot
            SELECT s.id, s.name, s.duration, s.geo_position
                 , ST_Distance(ST_GeographyFromText('SRID=4326;POINT(' || _lat || ' ' || _lon || ')')
                                , geo_position) AS distance
                 , t.timeslot_id, t.slot_a, t.slot_z
                 , CASE WHEN t.slot_a IS NOT NULL
                        THEN GREATEST(t.slot_a - at_time, interval '0') END AS time_dist
            FROM   supplier s
            LEFT   JOIN LATERAL (
               SELECT *
               FROM   timeslot
               WHERE  supplier_id = supplier_id
               AND    slot_z > at_time + s.duration  -- excl. upper bound
               ORDER  BY slot_z
               LIMIT  1
               ) t ON true
            WHERE  s.type = _type
            ORDER  BY s.distance
            LIMIT  100
            )
         SELECT *
         FROM  (
            SELECT DISTINCT ON (supplier_id) *  -- 1 slot per supplier
            FROM  (
               TABLE sup_nearby  -- matching or later slot
      
               UNION ALL         -- earlier slot
               SELECT s.id, s.name, s.duration, s.geo_position
                    , s.distance
                    , t.timeslot_id, t.slot_a, t.slot_z
                    , GREATEST(at_time - t.slot_a, interval '0') AS time_dist
               FROM   sup_nearby s
               CROSS  JOIN LATERAL (  -- this time CROSS JOIN!
                  SELECT *
                  FROM   timeslot
                  WHERE  supplier_id = s.supplier_id
                  AND    slot_z <= at_time  -- excl. upper bound
                  ORDER  BY slot_z DESC
                  LIMIT  1
                  ) t
               WHERE  s.time_dist IS DISTINCT FROM interval '0'  -- exact matches are done
               ) sub
            ORDER  BY supplier_id, time_dist  -- pick temporally closest slot per supplier
         ) sub
         ORDER  BY time_dist, distance;  -- matches first, ordered by distance; then misses, ordered by time distance
      
      $func$  LANGUAGE sql;
      

      我没有使用您的视图supplier_slots,而是针对性能进行了优化.该视图可能仍然很方便.您可能包括tstzrange(slot_a, slot_z) AS slot以便向后兼容.

      I did not use your view supplier_slots and optimized for performance instead. The view may still be convenient. You might include tstzrange(slot_a, slot_z) AS slot for backward compatibility.

      查找100个最接近的供应商的基本查询是教科书"K最近的邻居"问题. GiST索引对此非常有效.相关:

      The basic query to find the 100 closest suppliers is a textbook "K Nearest Neighbour" problem. A GiST index works well for this. Related:

      可以将附加任务(查找时间上最近的时隙)分为两个任务:查找下一个较高的行和下一个较低的行.该解决方案的核心功能是具有两个子查询,其中分别包含ORDER BY slot_z LIMIT 1ORDER BY slot_z DESC LIMIT 1,这将导致两个非常快速的索引扫描.

      The additional task (find the temporally nearest slot) can be split in two tasks: to find the next higher and the next lower row. The core feature of the solution is to have two subqueries with ORDER BY slot_z LIMIT 1 and ORDER BY slot_z DESC LIMIT 1, which result in two very fast index scans.

      我将第一个与找到实际匹配项结合在一起,这是(我认为是聪明的)优化,但可能会偏离实际解决方案.

      I combined the first one with finding actual matches, which is a (smart, I think) optimization, but may distract from the actual solution.

      这篇关于Postgres:如何从范围之外的时间戳中找到最接近的tsrange?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-24 13:40