本文介绍了高效测地最近的邻居的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从纬度/经度数据(以弧度为单位)开始,我试图有效地找到最接近的n个邻居,最好是测地线(WGS-84)的距离.

Starting with latitude/longitude data (in radians), I’m trying to efficiently find the nearest n neighbors, ideally with geodesic (WGS-84) distance.

现在,我正在使用 sklearn的BallTree 具有有效距离(KD-Tres仅采用minkowskian距离),既好又快速(3-4秒,可以在7500个可能的匹配中找到1200个位置的最近5个邻居),但不如 accurate 那样精确我需要.代码:

Right now I’m using sklearn’s BallTree with haversine distance (KD-Tres only take minkowskian distance), which is nice and fast (3-4 seconds to find nearest 5 neighbors for 1200 locations in 7500 possible matches), but not as accurate as I need. Code:

tree = BallTree(possible_matches[['x', 'y']], leaf_size=2, metric='haversine')
distances, indices = tree.query(locations[['x', 'y']], k=5)

当我用自定义函数代替度量标准(metric=lambda u, v: geopy.distance.geodesic(u, v).miles)时,将花费不合理的"长时间(在与上述相同的情况下为4分钟).据记载,自定义功能可能会花费很长时间,但并不能帮助我解决问题.

When I substitute in a custom function for metric (metric=lambda u, v: geopy.distance.geodesic(u, v).miles) it takes an "unreasonably" long time (4 minutes in the same case as above). It’s documented that custom functions can take a long time, but doesn't help me solve my problem.

我看过使用具有ECEF坐标和欧几里得距离的KD树,但是我不确定这是否更准确.

I looked at using a KD-Tree with ECEF coordinates and euclidian distance, but I’m not sure if that’s actually any more accurate.

如何保持当前方法的速度,但提高距离精度?

How can I keep the speed of my current method, but improve my distance accuracy?

推荐答案

度量标准很慢的主要原因是它是用Python编写的,而sklearn中的其他度量是用Cython/C ++/C编写的.

The main reason for why your metric is slow is that it written in Python while other metrics in sklearn are written in Cython/C++/C.

例如,如此处所述,或者对于<此处,您必须在Cython中实施指标,并使用自己的版本 BallTree 并包含您的自定义指标.

So as for instance discussed here for Random Forests or here you would have to implement your metric in Cython, fork your own version of BallTree and include your custom metric there.

这篇关于高效测地最近的邻居的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 06:29