两个表之间最近点的独特赋值

本文介绍了两个表之间最近点的独特赋值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在安装了 PostGis 2.2.0 的 Postgres 9.5 数据库中，我有两个具有几何数据（点）的表，我想从一个表中指定点来自其他表的点数，但是我不希望将 c>列 pv 分配 buildings.gid 时设置：

 更新pvanlagen 
 SET buildid = buildings.gid，dist = '50'
 FROM buildings 
 WHERE buildid IS NULL 
 AND buildings.pv为NULL 
 AND pvanlagen.gemname = buildings.gemname 
 AND ST_Distance（ST_Transform（pvanlagen.geom，31467）
，ST_Transform（buildings.centroid，31467））< 50; 
 
更新建筑
 SET pv = true 
 FROM pvanlagen 
 WHERE buildings.gid = pvanlagen.buildid;

我在建筑物中测试了50行申请所有这些都需要太长时间。我有 3.200.000个建筑和 260.000 PV 。

gid 应分配。如果有关系，分配哪个 gid 就不用了。如果我们需要制定一个规则，我们可以用较低的 gid 取代建筑。

50米作为限制。我使用 ST_Distance（），因为它返回的最小距离应该在50米以内。后来我提出了多次，直到每个PV Anlage被分配。

建筑物和PV分配给各自的区域（ gemname ）。这应该使转让更便宜，因为我知道最近的建筑物必须在同一区域内（ gemname ）。

我在下面的反馈中尝试了这个查询：

 更新pvanlagen p1 
 SET buildid = buildings.gid 
 ，dist = buildings.dist 
 FROM（
 SELECT DISTINCT ON（b.gid）
 p.id，b.gid，b.dist :: numeric 
 FROM（
 SELECT id，ST_Transform（geom，31467）
 FROM pvanlagen 
 WHERE buildid IS NULL  - 尚未分配
）p 
，LATERAL（
 SELECT b。 gid，ST_Distance（ST_Transform（p1.geom，31467），ST_Transform（b.centroid，31467））AS dist 
从建筑物b 
 LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid 
 WHERE p1.buildid IS NULL 
 AND b.gemname = p1.gemname 
 ORDER BY ST_Transform（p1.geom，31467）<  - > ST_Transform（b.centroid，31467）
 LIMIT 1 
）b 
 ORDER BY b.gid，b.dist，p.id  -  tie breaker 
）x，buildings 
 WHERE p1.id = x.id;

但是，在234 ms执行时间内， 0行受影响返回。

我在哪里出错？

解决方案

表架构

要执行您的规则，只需声明 pvanlagen.buildid

简单解决方案

一个简单的解决方案是： p>

 更新pvanlagen p1 
 SET buildid = sub.b_gid 
，dist = sub.dist  - 实际距离
 FROM（
 SELECT DISTINCT ON（b_gid）
 pv_gid，b_gid，dist 
 FROM（
 SELECT gid AS pv_gid，ST_Transform（geom，31467）AS geom31467 
 FROM pvanlagen 
 WHERE buildid IS NULL  - 未分配
）p 
，LATERAL（
 SELECT b.gid AS b_gid 
，round（ST_Distance（p。 geom31467 
，ST_Transform（b.centroid，31467））:: numeric，2）AS dist  - 见下文
从建筑物b 
 LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid  - 还没有分配... 
 WHERE p1.buildid IS NULL  -  ...但
  -  AND p.gemname = b。 gemname  - 不需要执行，见下面
 ORDER BY p.geom31467<  - > ST_Transform（b.centroid，31467）
 LIMIT 1 
）b 
 ORDER BY b_gid，dist，pv_gid  -  tie breaker 
）sub 
 WHERE p1.gid = sub.pv_gid;

我使用 DISTINCT ON（b_gid）减少到每个建筑物一个一个行，以最短的距离挑选PV。详细信息：

对于任何对于更多一个PV最接近的建筑，只分配最近的PV。 PK列 gid （别名 pv_gid ）如果两个同样接近，则作为tiebreaker。在这种情况下，某些光伏将从更新中删除，并保留未分配。重复查询，直到所有PV被分配。

这仍然是一个简单的算法，但是。看看我上面的图表，这将建筑物4分配给PV 4和建筑5到PV 5，而4-5和5-4可能是一个更好的解决方案...

Aside：type for dist column

目前您使用。您的原始查询分配了一个常量整数，在数字中无点制作。

在我的新查询中返回实际距离（以米为单位），如。如果我们简单地指定我们在数字数据类型中获得15个左右的小数位数，并且数字不是确切开始。我非常怀疑你想浪费存储空间。

我宁愿从计算中保存原始的双精度。或者，更好，，根据需要。如果仪表足够精确，只需转换并保存整数（自动舍入数字）。或者先乘以100来保存cm：

 （ST_Distance（...）* 100）:: int

In my Postgres 9.5 database with PostGis 2.2.0 installed, I have two tables with geometric data (points) and I want to assign points from one table to the points from the other table, but I don't want a buildings.gid to be assigned twice. As soon as one buildings.gid is assigned, it should not be assigned to another pvanlagen.buildid.

Table definitions

buildings:

CREATE TABLE public.buildings (
  gid numeric NOT NULL DEFAULT nextval('buildings_gid_seq'::regclass),
  osm_id character varying(11),
  name character varying(48),
  type character varying(16),
  geom geometry(MultiPolygon,4326),
  centroid geometry(Point,4326),
  gembez character varying(50),
  gemname character varying(50),
  krsbez character varying(50),
  krsname character varying(50),
  pv boolean,
  gr numeric,
  capac numeric,
  instdate date,
  pvid numeric,
  dist numeric,
  CONSTRAINT buildings_pkey PRIMARY KEY (gid)
);

CREATE INDEX build_centroid_gix
  ON public.buildings
  USING gist
  (st_transform(centroid, 31467));

CREATE INDEX buildings_geom_idx
  ON public.buildings
  USING gist
  (geom);

pvanlagen:

CREATE TABLE public.pvanlagen (
  gid integer NOT NULL DEFAULT nextval('pv_bis2010_bayern_wgs84_gid_seq'::regclass),
  tso character varying(254),
  tso_number numeric(10,0),
  system_ope character varying(254),
  system_key character varying(254),
  location character varying(254),
  postal_cod numeric(10,0),
  street character varying(254),
  capacity numeric,
  voltage_le character varying(254),
  energy_sou character varying(254),
  beginning_ date,
  end_operat character varying(254),
  id numeric(10,0),
  kkz numeric(10,0),
  geom geometry(Point,4326),
  gembez character varying(50),
  gemname character varying(50),
  krsbez character varying(50),
  krsname character varying(50),
  buildid numeric,
  dist numeric,
  trans boolean,
  CONSTRAINT pv_bis2010_bayern_wgs84_pkey PRIMARY KEY (gid),
  CONSTRAINT pvanlagen_buildid_fkey FOREIGN KEY (buildid)
      REFERENCES public.buildings (gid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid)
);

CREATE INDEX pv_bis2010_bayern_wgs84_geom_idx
  ON public.pvanlagen
  USING gist
  (geom);

Query

My idea was to add a boolean column pv in the buildings table, which is set when a buildings.gid was assigned:

UPDATE pvanlagen
SET buildid=buildings.gid, dist='50'
FROM buildings
WHERE buildid IS NULL
AND buildings.pv is NULL
AND pvanlagen.gemname=buildings.gemname
AND ST_Distance(ST_Transform(pvanlagen.geom,31467)
               ,ST_Transform(buildings.centroid,31467))<50;

UPDATE buildings
SET pv=true
FROM pvanlagen
WHERE buildings.gid=pvanlagen.buildid;

I tested for 50 rows in buildings but it takes too long to apply for all of them. I have 3.200.000 buildings and 260.000 PV.

The gid of the closest building shall be assigned. If In case of ties, it should not matter which gid is assigned. If we need to frame a rule, we can take the building with the lower gid.

50 meters was meant to work as a limit. I used ST_Distance() because it returns the minimum distance, which should be within 50 meters. Later I raised it multiple times, until every PV Anlage was assigned.

Buildings and PV are assigned to their respective regions (gemname). This should make the assignment cheaper, since I know the nearest building must be within the same region (gemname).

I tried this query after feedback below:

UPDATE pvanlagen p1
SET    buildid = buildings.gid
 , dist = buildings.dist
FROM (
   SELECT DISTINCT ON (b.gid)
          p.id, b.gid, b.dist::numeric
   FROM  (
      SELECT id, ST_Transform(geom, 31467)
      FROM   pvanlagen
      WHERE  buildid IS NULL  -- not assigned yet
      ) p
        , LATERAL (
      SELECT b.gid, ST_Distance(ST_Transform(p1.geom, 31467), ST_Transform(b.centroid, 31467)) AS dist
      FROM   buildings      b
      LEFT   JOIN pvanlagen p1 ON p1.buildid = b.gid
      WHERE  p1.buildid IS NULL
      AND    b.gemname = p1.gemname
      ORDER  BY ST_Transform(p1.geom, 31467) <-> ST_Transform(b.centroid, 31467)
      LIMIT  1
            ) b
       ORDER  BY b.gid, b.dist, p.id  -- tie breaker
       ) x, buildings
 WHERE   p1.id = x.id;

But it returns with 0 rows affected in 234 ms execution time.
Where am I going wrong?

解决方案

Table schema

To enforce your rule simply declare pvanlagen.buildid UNIQUE:

ALTER TABLE pvanlagen ADD CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid);

building.gid is the PK, as your update revealed. To also enforce referential integrity add a FOREIGN KEY constraint to buildings.gid.

You have implemented both by now. But it would be more efficient to run the big UPDATE below before you add these constraints.

There is a lot more that should be improved in your table definition. For one, buildings.gid as well as pvanlagen.buildid should be type integer (or possibly bigint if you burn a lot of PK values). numeric is expensive nonsense.

Let's focus on the core problem:

Basic Query to find closest building

The case is not as simple as it may seem. It's a "nearest neighbour" problem, with the additional complication of unique assignment.

This query finds the nearest one building for each PV (short for PV Anlage - row in pvanlagen), where neither is assigned, yet:

SELECT pv_gid, b_gid, dist
FROM  (
   SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467
   FROM   pvanlagen
   WHERE  buildid IS NULL  -- not assigned yet
   ) p
     , LATERAL (
   SELECT b.gid AS b_gid
        , round(ST_Distance(p.geom31467
                      , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist  -- see below
   FROM   buildings b
   LEFT   JOIN pvanlagen p1 ON p1.buildid = b.gid  -- also not assigned ...
   WHERE  p1.buildid IS NULL                       -- ... yet
   -- AND    p.gemname = b.gemname                 -- not needed for performance, see below
   ORDER  BY p.geom31467 <-> ST_Transform(b.centroid, 31467)
   LIMIT  1
   ) b;

To make this query fast, you need a spatial, functional GiST index on buildings to make it much faster:

CREATE INDEX build_centroid_gix ON buildings USING gist (ST_Transform(centroid, 31467));

Not sure why you don't

Remaining Problem

We can use the above query it in an UPDATE statement. Each PV is only used once, but more than one PV might still find the same building to be closest. You only allow one PV per building. So how would you resolve that?

In other words, how would you assign objects here?

Simple solution

One simple solution would be:

UPDATE pvanlagen p1
SET    buildid = sub.b_gid
     , dist    = sub.dist  -- actual distance
FROM  (
   SELECT DISTINCT ON (b_gid)
          pv_gid, b_gid, dist
   FROM  (
      SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467
      FROM   pvanlagen
      WHERE  buildid IS NULL  -- not assigned yet
      ) p
        , LATERAL (
      SELECT b.gid AS b_gid
           , round(ST_Distance(p.geom31467
                         , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist  -- see below
      FROM   buildings      b
      LEFT   JOIN pvanlagen p1 ON p1.buildid = b.gid  -- also not assigned ...
      WHERE  p1.buildid IS NULL                       -- ... yet
      -- AND    p.gemname = b.gemname                 -- not needed for performance, see below
      ORDER  BY p.geom31467 <-> ST_Transform(b.centroid, 31467)
      LIMIT  1
      ) b
   ORDER  BY b_gid, dist, pv_gid  -- tie breaker
   ) sub
WHERE   p1.gid = sub.pv_gid;

I use DISTINCT ON (b_gid) to reduce to exactly one row per building, picking the PV with the shortest distance. Details:

Select first row in each GROUP BY group?

For any building that is closest for more one PV, only the closest PV is assigned. The PK column gid (alias pv_gid) serves as tiebreaker if two are equally close. In such a case, some PV are dropped from the update and remain unassigned. Repeat the query until all PV are assigned.

This is still a simplistic algorithm, though. Looking at my diagram above, this assigns building 4 to PV 4 and building 5 to PV 5, while 4-5 and 5-4 would probably be a better solution overall ...

Aside: type for dist column

Currently you use numeric for it. your original query assigned a constant integer, no point making in numeric.

In my new query ST_Distance() returns the actual distance in meters as double precision. If we simply assign that we get 15 or so fractional digits in the numeric data type, and the number is not that exact to begin with. I seriously doubt you want to waste the storage.

I would rather save the original double precision from the calculation. or, better yet, round as needed. If meters are exact enough, just cast to and save an integer (rounding the number automatically). Or multiply with 100 first to save cm:

(ST_Distance(...) * 100)::int

这篇关于两个表之间最近点的独特赋值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！