本文介绍了我可以使用任意指标来搜索 KD 树吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚完成了用于执行快速最近邻搜索的 kd-tree.除了欧几里得距离之外,我对使用不同的距离度量很感兴趣.我对 kd-tree 的理解是,如果指标是非欧几里得的,则快速 kd-tree 搜索不能保证提供精确搜索,这意味着如果我想尝试,我可能需要实现新的数据结构和搜索算法为我的搜索找出新的指标.

I just finished implementing a kd-tree for doing fast nearest neighbor searches. I'm interested in playing around with different distance metrics other than the Euclidean distance. My understanding of the kd-tree is that the speedy kd-tree search is not guaranteed to give exact searches if the metric is non-Euclidean, which means that I might need to implement a new data structure and search algorithm if I want to try out new metrics for my search.

我有两个问题:

  1. 使用 kd-tree 是否会永久将我与 欧几里德距离?
  2. 如果是这样,我应该尝试哪些其他类型的算法来处理任意指标?我没有太多时间来实现许多不同的数据结构,但我正在考虑的其他结构包括 覆盖树vp-trees.
  1. Does using a kd-tree permanently tie me to the Euclidean distance?
  2. If so, what other sorts of algorithms should I try that work for arbitrary metrics? I don't have a ton of time to implement lots of different data structures, but other structures I'm thinking about include cover trees and vp-trees.

推荐答案

您链接到的维基百科页面上描述的最近邻搜索过程当然可以推广到其他距离度量,前提是您将超球面"替换为等效的几何给定度量的对象,并测试每个超平面与该对象的交叉.

The nearest-neighbour search procedure described on the Wikipedia page you linked to can certainly be generalised to other distance metrics, provided you replace "hypersphere" with the equivalent geometrical object for the given metric, and test each hyperplane for crossings with this object.

示例:如果您改用曼哈顿距离(即矢量分量中所有差异的绝对值之和),您的超球面将成为(多维)菱形.(这在 2D 中最容易可视化——如果您当前的最近邻距离查询点 x 距离 p,则不同超平面后面的任何更近邻必须与宽度和高度为 2x 且以 p 为中心的菱形形状).这可能会使超平面交叉测试更难编码或运行更慢,但一般原则仍然适用.

Example: if you are using the Manhattan distance instead (i.e. the sum of the absolute values of all differences in vector components), your hypersphere would become a (multidimensional) diamond. (This is easiest to visualise in 2D -- if your current nearest neighbour is at distance x from the query point p, then any closer neighbour behind a different hyperplane must intersect a diamond shape that has width and height 2x and is centred on p). This might make the hyperplane-crossing test more difficult to code or slower to run, however the general principle still applies.

这篇关于我可以使用任意指标来搜索 KD 树吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 03:25