我将使用 Scala 语法提出这个问题,即使这个问题确实与语言无关。
假设我有两个列表
val groundtruth:List[Range]
val testresult:List[Range]
我想找到与
testresult
中某个元素重叠的 groundtruth
的所有元素。我可以这样做:
def overlaps(x:Range,y:Range) = (x contains y.start) || (y contains x.start)
val result = testresult.filter{ tr => groundtruth.exists{gt => overlaps(gt,tr)}}
但这需要
O(testresult.size * groundtruth.size)
时间来运行。有没有更快的算法来计算这个结果,或者有什么数据结构可以让
exists
测试更高效?附言该算法应适用于使用如下表达式生成的
groundtruth
和 testresult
。换句话说,无法保证列表中范围之间的关系,Range
的平均大小为 100 或更大。(1 to 1000).map{x =>
val midPt = r.nextInt(100000);
((midPt - r.nextInt(100)) to (midPt + r.nextInt(100)));
}.toList
最佳答案
尝试 interval tree 。 Cormen, Leiserson, Rivest and Stein 在 (IIRC) 第 14 章中讨论这些。
或者,如果您的间隔列表都已排序并且列表中的间隔不重叠,那么以下算法可以在线性时间内解决您的问题,并在两个列表上进行一次传递:
(define interval cons)
(define lower car)
(define upper cdr)
(define (overlap a b)
(cond ((or (null? a) (null? b)) '())
((< (upper a) (lower b))
(overlap (cdr a) b))
((> (lower a) (upper b))
(overlap a (cdr b)))
(#t ;; (car a) and (car b) overlap
;; EDIT: there's a bug in the following part.
;; The code shouldn't skip over both cars at once,
;; since they may also overlap with further intervals.
;; However, I'm too tired to fix this now.
(cons (interval (max (lower a) (lower b))
(min (upper a) (upper b)))
(overlap a b)))))
(我希望你能读懂 Scheme :)