问题描述
在安装了 PostGis 2.2.0 的 Postgres 9.5 数据库中,我有两个具有几何数据(点)的表,我想从一个表中指定点来自其他表的点数,但是我不希望将 c>列 pv 分配 buildings.gid 时设置:
更新pvanlagen
SET buildid = buildings.gid,dist = '50'
FROM buildings
WHERE buildid IS NULL
AND buildings.pv为NULL
AND pvanlagen.gemname = buildings.gemname
AND ST_Distance(ST_Transform(pvanlagen.geom,31467)
,ST_Transform(buildings.centroid,31467))< 50;
更新建筑
SET pv = true
FROM pvanlagen
WHERE buildings.gid = pvanlagen.buildid;
我在建筑物中测试了50行申请所有这些都需要太长时间。我有 3.200.000个建筑和 260.000 PV 。
gid 应分配。如果有关系,分配哪个 gid 就不用了。如果我们需要制定一个规则,我们可以用较低的 gid 取代建筑。
50米作为限制。我使用 ST_Distance(),因为它返回的最小距离应该在50米以内。后来我提出了多次,直到每个PV Anlage被分配。
建筑物和PV分配给各自的区域( gemname )。这应该使转让更便宜,因为我知道最近的建筑物必须在同一区域内( gemname )。
我在下面的反馈中尝试了这个查询:
更新pvanlagen p1
SET buildid = buildings.gid
,dist = buildings.dist
FROM(
SELECT DISTINCT ON(b.gid)
p.id,b.gid,b.dist :: numeric
FROM(
SELECT id,ST_Transform(geom,31467)
FROM pvanlagen
WHERE buildid IS NULL - 尚未分配
)p
,LATERAL(
SELECT b。 gid,ST_Distance(ST_Transform(p1.geom,31467),ST_Transform(b.centroid,31467))AS dist
从建筑物b
LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid
WHERE p1.buildid IS NULL
AND b.gemname = p1.gemname
ORDER BY ST_Transform(p1.geom,31467)< - > ST_Transform(b.centroid,31467)
LIMIT 1
)b
ORDER BY b.gid,b.dist,p.id - tie breaker
)x,buildings
WHERE p1.id = x.id;
但是,在234 ms执行时间内, 0行受影响返回。
我在哪里出错?
表架构
要执行您的规则,只需声明 pvanlagen.buildid
简单解决方案
一个简单的解决方案是: p>
更新pvanlagen p1
SET buildid = sub.b_gid
,dist = sub.dist - 实际距离
FROM(
SELECT DISTINCT ON(b_gid)
pv_gid,b_gid,dist
FROM(
SELECT gid AS pv_gid,ST_Transform(geom,31467)AS geom31467
FROM pvanlagen
WHERE buildid IS NULL - 未分配
)p
,LATERAL(
SELECT b.gid AS b_gid
,round(ST_Distance(p。 geom31467
,ST_Transform(b.centroid,31467)):: numeric,2)AS dist - 见下文
从建筑物b
LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid - 还没有分配...
WHERE p1.buildid IS NULL - ...但
- AND p.gemname = b。 gemname - 不需要执行,见下面
ORDER BY p.geom31467< - > ST_Transform(b.centroid,31467)
LIMIT 1
)b
ORDER BY b_gid,dist,pv_gid - tie breaker
)sub
WHERE p1.gid = sub.pv_gid;
我使用 DISTINCT ON(b_gid)减少到每个建筑物一个一个行,以最短的距离挑选PV。详细信息:
对于任何对于更多一个PV最接近的建筑,只分配最近的PV。 PK列 gid (别名 pv_gid )如果两个同样接近,则作为tiebreaker。在这种情况下,某些光伏将从更新中删除,并保留未分配。 重复 查询,直到所有PV被分配。
这仍然是一个简单的算法,但是。看看我上面的图表,这将建筑物4分配给PV 4和建筑5到PV 5,而4-5和5-4可能是一个更好的解决方案...
Aside:type for dist column
目前您使用 。您的原始查询分配了一个常量整数,在数字中无点制作。
在我的新查询中返回实际距离(以米为单位),如。如果我们简单地指定我们在数字数据类型中获得15个左右的小数位数,并且数字不是 确切开始。我非常怀疑你想浪费存储空间。
我宁愿从计算中保存原始的双精度。或者,更好,,根据需要。如果仪表足够精确,只需转换并保存整数(自动舍入数字)。或者先乘以100来保存cm:
(ST_Distance(...)* 100):: int
In my Postgres 9.5 database with PostGis 2.2.0 installed, I have two tables with geometric data (points) and I want to assign points from one table to the points from the other table, but I don't want a buildings.gid to be assigned twice. As soon as one buildings.gid is assigned, it should not be assigned to another pvanlagen.buildid.
Table definitions
buildings:
CREATE TABLE public.buildings ( gid numeric NOT NULL DEFAULT nextval('buildings_gid_seq'::regclass), osm_id character varying(11), name character varying(48), type character varying(16), geom geometry(MultiPolygon,4326), centroid geometry(Point,4326), gembez character varying(50), gemname character varying(50), krsbez character varying(50), krsname character varying(50), pv boolean, gr numeric, capac numeric, instdate date, pvid numeric, dist numeric, CONSTRAINT buildings_pkey PRIMARY KEY (gid) ); CREATE INDEX build_centroid_gix ON public.buildings USING gist (st_transform(centroid, 31467)); CREATE INDEX buildings_geom_idx ON public.buildings USING gist (geom);
pvanlagen:
CREATE TABLE public.pvanlagen ( gid integer NOT NULL DEFAULT nextval('pv_bis2010_bayern_wgs84_gid_seq'::regclass), tso character varying(254), tso_number numeric(10,0), system_ope character varying(254), system_key character varying(254), location character varying(254), postal_cod numeric(10,0), street character varying(254), capacity numeric, voltage_le character varying(254), energy_sou character varying(254), beginning_ date, end_operat character varying(254), id numeric(10,0), kkz numeric(10,0), geom geometry(Point,4326), gembez character varying(50), gemname character varying(50), krsbez character varying(50), krsname character varying(50), buildid numeric, dist numeric, trans boolean, CONSTRAINT pv_bis2010_bayern_wgs84_pkey PRIMARY KEY (gid), CONSTRAINT pvanlagen_buildid_fkey FOREIGN KEY (buildid) REFERENCES public.buildings (gid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid) ); CREATE INDEX pv_bis2010_bayern_wgs84_geom_idx ON public.pvanlagen USING gist (geom);
Query
My idea was to add a boolean column pv in the buildings table, which is set when a buildings.gid was assigned:
UPDATE pvanlagen SET buildid=buildings.gid, dist='50' FROM buildings WHERE buildid IS NULL AND buildings.pv is NULL AND pvanlagen.gemname=buildings.gemname AND ST_Distance(ST_Transform(pvanlagen.geom,31467) ,ST_Transform(buildings.centroid,31467))<50; UPDATE buildings SET pv=true FROM pvanlagen WHERE buildings.gid=pvanlagen.buildid;
I tested for 50 rows in buildings but it takes too long to apply for all of them. I have 3.200.000 buildings and 260.000 PV.
The gid of the closest building shall be assigned. If In case of ties, it should not matter which gid is assigned. If we need to frame a rule, we can take the building with the lower gid.
50 meters was meant to work as a limit. I used ST_Distance() because it returns the minimum distance, which should be within 50 meters. Later I raised it multiple times, until every PV Anlage was assigned.
Buildings and PV are assigned to their respective regions (gemname). This should make the assignment cheaper, since I know the nearest building must be within the same region (gemname).
I tried this query after feedback below:
UPDATE pvanlagen p1 SET buildid = buildings.gid , dist = buildings.dist FROM ( SELECT DISTINCT ON (b.gid) p.id, b.gid, b.dist::numeric FROM ( SELECT id, ST_Transform(geom, 31467) FROM pvanlagen WHERE buildid IS NULL -- not assigned yet ) p , LATERAL ( SELECT b.gid, ST_Distance(ST_Transform(p1.geom, 31467), ST_Transform(b.centroid, 31467)) AS dist FROM buildings b LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid WHERE p1.buildid IS NULL AND b.gemname = p1.gemname ORDER BY ST_Transform(p1.geom, 31467) <-> ST_Transform(b.centroid, 31467) LIMIT 1 ) b ORDER BY b.gid, b.dist, p.id -- tie breaker ) x, buildings WHERE p1.id = x.id;
But it returns with 0 rows affected in 234 ms execution time.
Where am I going wrong?
Table schema
To enforce your rule simply declare pvanlagen.buildid UNIQUE:
ALTER TABLE pvanlagen ADD CONSTRAINT pvanlagen_buildid_uni UNIQUE (buildid);
building.gid is the PK, as your update revealed. To also enforce referential integrity add a FOREIGN KEY constraint to buildings.gid.
You have implemented both by now. But it would be more efficient to run the big UPDATE below before you add these constraints.
There is a lot more that should be improved in your table definition. For one, buildings.gid as well as pvanlagen.buildid should be type integer (or possibly bigint if you burn a lot of PK values). numeric is expensive nonsense.
Let's focus on the core problem:
Basic Query to find closest building
The case is not as simple as it may seem. It's a "nearest neighbour" problem, with the additional complication of unique assignment.
This query finds the nearest one building for each PV (short for PV Anlage - row in pvanlagen), where neither is assigned, yet:
SELECT pv_gid, b_gid, dist FROM ( SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467 FROM pvanlagen WHERE buildid IS NULL -- not assigned yet ) p , LATERAL ( SELECT b.gid AS b_gid , round(ST_Distance(p.geom31467 , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist -- see below FROM buildings b LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid -- also not assigned ... WHERE p1.buildid IS NULL -- ... yet -- AND p.gemname = b.gemname -- not needed for performance, see below ORDER BY p.geom31467 <-> ST_Transform(b.centroid, 31467) LIMIT 1 ) b;
To make this query fast, you need a spatial, functional GiST index on buildings to make it much faster:
CREATE INDEX build_centroid_gix ON buildings USING gist (ST_Transform(centroid, 31467));
Not sure why you don't
Related answers with more explanation:
- Spatial query on large table with multiple self joins performing slow
- How do I query all rows within a 5-mile radius of my coordinates?
Further reading:
- http://workshops.boundlessgeo.com/postgis-intro/knn.html
- http://www.postgresonline.com/journal/archives/306-KNN-GIST-with-a-Lateral-twist-Coming-soon-to-a-database-near-you.html
With the index in place, we don't need to restrict matches to the same gemname for performance. Only do this if it's an actual rule to enforced. If it has to be observed at all times, include the column in the FK constraint:
Remaining Problem
We can use the above query it in an UPDATE statement. Each PV is only used once, but more than one PV might still find the same building to be closest. You only allow one PV per building. So how would you resolve that?
In other words, how would you assign objects here?
Simple solution
One simple solution would be:
UPDATE pvanlagen p1 SET buildid = sub.b_gid , dist = sub.dist -- actual distance FROM ( SELECT DISTINCT ON (b_gid) pv_gid, b_gid, dist FROM ( SELECT gid AS pv_gid, ST_Transform(geom, 31467) AS geom31467 FROM pvanlagen WHERE buildid IS NULL -- not assigned yet ) p , LATERAL ( SELECT b.gid AS b_gid , round(ST_Distance(p.geom31467 , ST_Transform(b.centroid, 31467))::numeric, 2) AS dist -- see below FROM buildings b LEFT JOIN pvanlagen p1 ON p1.buildid = b.gid -- also not assigned ... WHERE p1.buildid IS NULL -- ... yet -- AND p.gemname = b.gemname -- not needed for performance, see below ORDER BY p.geom31467 <-> ST_Transform(b.centroid, 31467) LIMIT 1 ) b ORDER BY b_gid, dist, pv_gid -- tie breaker ) sub WHERE p1.gid = sub.pv_gid;
I use DISTINCT ON (b_gid) to reduce to exactly one row per building, picking the PV with the shortest distance. Details:
For any building that is closest for more one PV, only the closest PV is assigned. The PK column gid (alias pv_gid) serves as tiebreaker if two are equally close. In such a case, some PV are dropped from the update and remain unassigned. Repeat the query until all PV are assigned.
This is still a simplistic algorithm, though. Looking at my diagram above, this assigns building 4 to PV 4 and building 5 to PV 5, while 4-5 and 5-4 would probably be a better solution overall ...
Aside: type for dist column
Currently you use numeric for it. your original query assigned a constant integer, no point making in numeric.
In my new query ST_Distance() returns the actual distance in meters as double precision. If we simply assign that we get 15 or so fractional digits in the numeric data type, and the number is not that exact to begin with. I seriously doubt you want to waste the storage.
I would rather save the original double precision from the calculation. or, better yet, round as needed. If meters are exact enough, just cast to and save an integer (rounding the number automatically). Or multiply with 100 first to save cm:
(ST_Distance(...) * 100)::int
这篇关于两个表之间最近点的独特赋值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!