本文介绍了地理空间搜索的geodist()和dist()之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Apache Solr中用于地理空间搜索的 Geodist(sfield,x,y)和dist(2,x,y,a,b)有什么区别?

What is the Difference between Geodist(sfield,x,y) and dist(2,x,y,a,b) in Apache Solr for Geo-Spacial Searches ??

dist(2,x,y,0,0):-计算每个文档在(0,0)与(x,y)之间的欧式距离.返回n维空间中两个向量(点)之间的距离.

dist(2,x,y,0,0) :- calculates the Euclidean distance between (0,0) and (x,y) for each document. Return the Distance between two Vectors (points) in an n-dimensional space.

我以前在我的网站上使用 geodist()距离功能进行地理空间搜索,但响应时间很大.因此针对不同距离函数进行了POC(概念验证),发现 dist(2,x,y,0,0)距离函数花费的时间相对较短.但是我想知道这背后的原因以及这两个函数用于计算距离的算法.

I was earlier using geodist() distance function for Geo-Spatial searches on my website but its response time was large. so have done a POC(proof of concept) for different distance functions and found that dist(2,x,y,0,0) distance function is relatively taking half of the time. But I want to know the reason behind this and the algorithms which both functions are using to calculate the distance.

我必须为它做一个差分矩阵,以进一步传达它.

I have to make a difference matrix for the same to convey it further.

推荐答案

主要区别在于geodist()用于空间字段类型.

The main difference is that geodist() is intended to work with spatial field types.

大多数空间实现都是基于Lucene的Points API,这是BKD索引.此字段类型严格限于以十进制经度/纬度为单位的坐标.在幕后,纬度和经度被索引为单独的数字.四种主要的字段类型可用于空间搜索:

Most spatial implementation are based on Lucene's Points API, which is a BKD Index. This field type is strictly limited to coordinates in lat/lon decimal degrees. Behind the scenes, latitude and longitude are indexed as separate numbers. Four main field types are available for spatial search :

  • LatLonPointSpatialField
  • LatLonType(现已弃用)及其非大地孪生PointType
  • SpatialRecursivePrefixTreeFieldType(简称RPT),包括RptWithGeometrySpatialField(衍生工具)
  • BBoxField(对于区域,由numberType引用的另一个字段类型的4个实例)

geodist (sfield, x, y)中, sfield 是表示两个点(纬度,经度)的空间字段类型,因此使用dist()的直接等效项将是使用sfieldX和sfi​​eldY实现dist (2, sfieldX, sfieldY, x, y)分别是sfield的(纬度,经度)坐标.

In geodist (sfield, x, y), sfield is a spatial field type that represents two points (lat,lon), so the direct equivalent using dist() would be to implement dist (2, sfieldX, sfieldY, x, y) with sfieldX and sfieldY being respectively the (lat,lon) coordinates of sfield.

使用dist (power, a, b, ...),您无法查询空间字段类型.为了执行相同的空间搜索,您将必须分别指定每个点的尺寸.它将需要2个维度的2个索引字段(或至少每个字段的),3d的3个索引字段,依此类推.这产生了巨大的差异,因为您将不得不分别索引每个点的每个坐标.

Using dist (power, a, b, ...) you can't query a spatial field type. In order to perform the same spatial search, you would have to specify every point's dimension separately. It would require 2 indexed fields (or values per field at least) for 2 dimensions, 3 for 3d, and so on. That makes a huge difference because you would have to index every coordinates of each point separately.

此外,您还可以按BBoxField字段类型的原样使用geodist(),该类型为每个文档字段索引一个矩形,并支持通过边界框进行搜索.若要对dist()执行相同操作,则必须计算框的中心点,以将其每个坐标输入为函数参数,因此,如果要使用 area 作为参数.

Besides, you can also use geodist() as is with the BBoxField field type that indexes a single rectangle per document field and supports searching via a bounding box. To do the same with dist() you would have to compute the center point of the box to input each one of its coordinates as a function argument, so it would be too much hassle to yield the same result if you want to use an area as parameter.

最后,例如LatLonPointSpatialField会根据 Haversine公式(大圆)进行距离计算BBoxField的执行速度更快一些,因为矩形的计算速度更快.的确,dist()甚至可能更快,但请记住,需要对更多字段进行索引,在查询时进行大量预处理才能产生相同的计算距离,并且正如Mats所言,地球的曲率.

Lastly, LatLonPointSpatialField for example does distance calculations based on Haversine formula (Great Circle), BBoxField does it a little faster because the rectangular shape is faster to compute. It's true that dist() may be even faster but remember that requires more field to be indexed, a lot of preprocess at query time to be able to yield the same calculated distance, and, as mentioned by Mats, it wouldn't take the earth' curvature into account.

这篇关于地理空间搜索的geodist()和dist()之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 15:52