问题描述
我有两个包含Lat和Lon的DataFrame.我想找到从一个(Lat,Lon)
对到另一个DataFrame的 ALL (Lat,Lon)
的距离,并获取最小值.我正在使用 geopy
的程序包.代码如下:
I have two DataFrame containing Lat and Lon. I want to find distance from one (Lat, Lon)
pair to ALL (Lat, Lon)
from another DataFrame and get the minimum. The package that I am using geopy
. The code is as follows:
from geopy import distance
import numpy as np
distanceMiles = []
count = 0
for id1, row1 in df1.iterrows():
target = (row1["LAT"], row1["LON"])
count = count + 1
print(count)
for id2, row2 in df2.iterrows():
point = (row2["LAT"], row2["LON"])
distanceMiles.append(distance.distance(target, point).miles)
closestPoint = np.argmin(distanceMiles)
distanceMiles = []
问题是 df1
具有 168K
行,而 df2
具有 1200
行.如何使其更快?
The problem is that df1
has 168K
rows and df2
has 1200
rows. How do I make it faster?
推荐答案
将其留在此处,以防将来有人需要它:
Leaving this here in case anyone needs it in the future:
如果仅需要最小距离,则不必强行使用所有对.有一些数据结构可以帮助您解决O(n * log(n))时间复杂性的问题,这比bruteforce方法要快得多.
If you need only the minimum distance, then you don't have to bruteforce all the pairs. There are some data structures that can help you solve this in O(n*log(n)) time complexity, which is way faster than the bruteforce method.
例如,您可以使用广义KNearestNeighbors(k = 1)算法来精确地做到这一点,因为您要注意点在球面上而不是在平面上.有关使用sklearn的示例实现,请参见此SO答案.
For example, you can use a generalized KNearestNeighbors (with k=1) algorithm to do exactly that, given that you pay attention to your points being on a sphere, not a plane. See this SO answer for an example implementation using sklearn.
似乎也有一些库可以解决此问题,例如 sknni 和 GriSPy .
There seems to be a few libraries to solve this too, like sknni and GriSPy.
这里也是另一个问题,关于理论的一点点.
Here's also another question that talks a bit about the theory.
这篇关于加快获取两个纬度和经度之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!