我将使用 Scala 语法提出这个问题,即使这个问题确实与语言无关。

假设我有两个列表

val groundtruth:List[Range]
val testresult:List[Range]

我想找到与 testresult 中某个元素重叠的 groundtruth 的所有元素。

我可以这样做:
def overlaps(x:Range,y:Range) = (x contains y.start) || (y contains x.start)
val result = testresult.filter{ tr => groundtruth.exists{gt => overlaps(gt,tr)}}

但这需要 O(testresult.size * groundtruth.size) 时间来运行。

有没有更快的算法来计算这个结果,或者有什么数据结构可以让 exists 测试更高效?

附言该算法应适用于使用如下表达式生成的 groundtruthtestresult。换句话说,无法保证列表中范围之间的关系,Range 的平均大小为 100 或更大。
(1 to 1000).map{x =>
   val midPt = r.nextInt(100000);
   ((midPt - r.nextInt(100)) to (midPt + r.nextInt(100)));
}.toList

最佳答案

尝试 interval treeCormen, Leiserson, Rivest and Stein 在 (IIRC) 第 14 章中讨论这些。

或者,如果您的间隔列表都已排序并且列表中的间隔不重叠,那么以下算法可以在线性时间内解决您的问题,并在两个列表上进行一次传递:

(define interval cons)
(define lower car)
(define upper cdr)

(define (overlap a b)
  (cond ((or (null? a) (null? b)) '())
        ((< (upper a) (lower b))
         (overlap (cdr a) b))
        ((> (lower a) (upper b))
         (overlap a (cdr b)))
        (#t  ;; (car a) and (car b) overlap
             ;; EDIT: there's a bug in the following part.
             ;; The code shouldn't skip over both cars at once,
             ;; since they may also overlap with further intervals.
             ;; However, I'm too tired to fix this now.
         (cons (interval (max (lower a) (lower b))
                         (min (upper a) (upper b)))
               (overlap a b)))))

(我希望你能读懂 Scheme :)

10-08 13:25