问题描述
在一个算法中,当我添加一个数据集时,我必须计算一个数据集的值。现在我这样做:
- 获取值
x
- 在后面的已排序数组中插入
x
- swap
x
down直到数组排序
- 读取元素位置
array [array.size * 3/4]
点3是O(n),其余的是O(1),但是这还是很慢的,特别是如果数组变大。有没有办法优化这个?
更新
感谢Nikita!由于我使用的是C ++,所以这是最容易实现的解决方案。以下是代码:
模板< class T>
class IterativePercentile {
public:
///百分位数必须在范围内[0,1(
IterativePercentile(double percentile))
:_percentile(percentile)
{}
//在O(log(n))中添加一个数字
void add(const T& x){
if(_lower.empty || x _lower.push_back(x);
std :: push_heap(_lower.begin(),_lower.end(),std :: less& T>());
} else {
_upper.push_back(x);
std :: push_heap(_upper.begin(),_upper.end() T>());
}
无符号size_lower =(无符号)((_ lower.size()+ _upper.size())* _percentile)+ 1;
if (_lower.size()> size_lower){
// lower to upper
std :: pop_heap(_lower.begin(),_lower.end(),std :: less< T>() );
_upper.push_back(_lower.back());
std :: push_heap(_upper.begin(),_upper.end(),std :: greater< T>());
_lower.pop_back();
} else i f(_lower.size() size_lower){
// upper to lower
std :: pop_heap(_upper.begin(),_upper.end(),std :: greater< T>());
_lower.push_back(_upper.back());
std :: push_heap(_lower.begin(),_lower.end(),std :: less< T>());
_upper.pop_back();
}
}
///访问O(1)中的百分位数
const T& get()const {
return _lower.front();
}
void clear(){
_lower.clear();
_upper.clear();
}
private:
double _percentile;
std :: vector< T> _降低;
std :: vector< T> _上;
};
解决方案你可以用两个。不确定是否有一个较少的设计解决方案,但是这个提供
O(logn)
时间复杂度和堆也包含在大多数编程语言的标准库中。 >
第一堆(堆A)包含最小的75%元素,另一个堆(堆B) - 其余的(最大的25%)。第一个是顶部最大的元素,第二个是最小的元素。
- 添加元素
查看新元素
x
是否 max(A )。如果是,则将其添加到堆A
,否则 - 堆B
。
现在如果我们向堆A添加了x
,它变得太大(占有75%以上的元素),我们需要从A中删除最大的元素
(O(logn)),并将其添加到堆B(也是O(logn))。
如果堆B变得太大,类似。- 查找0.75中位数
$ b $只需从A(或从B最小)中取最大的元素。需要O(logn)或O(1)时间,具体取决于堆的实现。
编辑
As Dolphin 指出,我们需要准确地指定每个n的每个堆应该有多大(如果我们想要精确的答案)。例如,如果size(A)= floor(n * 0.75)
和size(B)
那么,对于每个n> 0
,array [array.size * 3/4] = min(B)
。In an algorithm I have to calculate the 75th percentile of a data set whenever I add a value. Right now I am doing this:
- Get value
x
- Insert
x
in an already sorted array at the back - swap
x
down until the array is sorted - Read the element at position
array[array.size * 3/4]
Point 3 is O(n), and the rest is O(1), but this is still quite slow, especially if the array gets larger. Is there any way to optimize this?
UPDATE
Thanks Nikita! Since I am using C++ this is the solution easiest to implement. Here is the code:
template<class T> class IterativePercentile { public: /// Percentile has to be in range [0, 1( IterativePercentile(double percentile) : _percentile(percentile) { } // Adds a number in O(log(n)) void add(const T& x) { if (_lower.empty() || x <= _lower.front()) { _lower.push_back(x); std::push_heap(_lower.begin(), _lower.end(), std::less<T>()); } else { _upper.push_back(x); std::push_heap(_upper.begin(), _upper.end(), std::greater<T>()); } unsigned size_lower = (unsigned)((_lower.size() + _upper.size()) * _percentile) + 1; if (_lower.size() > size_lower) { // lower to upper std::pop_heap(_lower.begin(), _lower.end(), std::less<T>()); _upper.push_back(_lower.back()); std::push_heap(_upper.begin(), _upper.end(), std::greater<T>()); _lower.pop_back(); } else if (_lower.size() < size_lower) { // upper to lower std::pop_heap(_upper.begin(), _upper.end(), std::greater<T>()); _lower.push_back(_upper.back()); std::push_heap(_lower.begin(), _lower.end(), std::less<T>()); _upper.pop_back(); } } /// Access the percentile in O(1) const T& get() const { return _lower.front(); } void clear() { _lower.clear(); _upper.clear(); } private: double _percentile; std::vector<T> _lower; std::vector<T> _upper; };
解决方案You can do it with two heaps. Not sure if there's a less 'contrived' solution, but this one provides
O(logn)
time complexity and heaps are also included in standard libraries of most programming languages.First heap (heap A) contains smallest 75% elements, another heap (heap B) - the rest (biggest 25%). First one has biggest element on the top, second one - smallest.
- Adding element.
See if new element
x
is <=max(A)
. If it is, add it to heapA
, otherwise - to heapB
.
Now, if we addedx
to heap A and it became too big (holds more than 75% of elements), we need to remove biggest element fromA
(O(logn)) and add it to heap B (also O(logn)).
Similar if heap B became too big.- Finding "0.75 median"
Just take the largest element from A (or smallest from B). Requires O(logn) or O(1) time, depending on heap implementation.
edit
As Dolphin noted, we need to specify precisely how big each heap should be for every n (if we want precise answer). For example, ifsize(A) = floor(n * 0.75)
andsize(B)
is the rest, then, for everyn > 0
,array[array.size * 3/4] = min(B)
.这篇关于用于重复计算百分位数的快速算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!