


I have a set of curves defined as 2D arrays (number of points, number of coordinates). I am calculating a distance matrix for them using Hausdorff distance. My current code is as follows. Unfortunately it is too slow with 500-600 curves each having 50-100 3D points. Is there any faster way for that?

def distanceBetweenCurves(C1, C2):
    D = scipy.spatial.distance.cdist(C1, C2, 'euclidean')

    #none symmetric Hausdorff distances
    H1 = np.max(np.min(D, axis=1))
    H2 = np.max(np.min(D, axis=0))

    return (H1 + H2) / 2.

def distanceMatrixOfCurves(Curves):
    numC = len(Curves)

    D = np.zeros((numC, numC))
    for i in range(0, numC-1):
        for j in range(i+1, numC):
            D[i, j] = D[j, i] = distanceBetweenCurves(Curves[i], Curves[j])

    return D



这是一个难题.一种可能的方法是自行实现欧几里得距离,完全放弃scipy并使用 pypy ' JIT编译器.但这极有可能不会使您变虚弱.

This is kind of a hard problem. A possible way would be to implement the euclidian distance on your own, completely abandon scipy and make use of pypy's JIT compiler. But most likely this will not make you gane much.



The problem is less the implementation but the way you approach this problem. You chose a brute force approach by calculating the euclidian distance for each distinct pair of points in each possible pair of the metric space subsets. This is computationally demanding:

  • 假设您有500条曲线,每条曲线都有75个点.使用蛮力方法,您最终要计算出欧几里德距离500 * 499 * 75 * 75 = 1 403 437 500次.这种方法永远运行下去也就不足为奇了.

我不是专家,但是我知道Hausdorff距离广泛用于图像处理.我建议您浏览有关速度优化算法的文献.起点可能是,或本文.另外,经常提到与Hausdorff距离结合使用的是 Voroni图.

I'm not an expert with this but I know that the Hausdorff distance is extensively used in image processing. I would suggest you to browse the literature for speed optimized algorithms. A starting point might be this, or this paper. Also, often mentioned in combination with the Hausdorff distance is the Voroni diagram.


I hope these links might help you with this problem.


09-05 11:04