c++ - 合并 K 个排序数组/vector 的复杂性

在研究合并 k 个排序的连续数组/vector 的问题以及它与合并 k 个排序的链表在实现上有何不同时，我发现了两个用于合并 k 个连续数组的相对简单的简单解决方案和一个基于成对合并模拟的很好的优化方法mergeSort() 如何工作。我实现的两个简单的解决方案似乎具有相同的复杂性，但在我运行的一项大型随机测试中，似乎一个比另一个效率低下。

朴素的合并

我天真的合并方法如下。我们创建一个输出 vector<int> 并将其设置为我们给出的第一个 k vector 。然后我们合并第二个 vector ，然后是第三个，依此类推。由于典型的 merge() 方法接收两个 vector 并返回一个 vector ，它在空间和时间上都与两个 vector 中的元素数量呈渐近线性关系，因此总复杂度将是 O(n + 2n + 3n + ... + kn)，其中 n 是每个列表中元素的平均数量。由于我们正在添加 1n + 2n + 3n + ... + kn 我相信总复杂度是 O(n*k^2) 。考虑以下代码:

vector<int> mergeInefficient(const vector<vector<int> >& multiList) {
  vector<int> finalList = multiList[0];
  for (int j = 1; j < multiList.size(); ++j) {
    finalList = mergeLists(multiList[j], finalList);
  }

  return finalList;
}

天真选择

我的第二个天真的解决方案的工作原理如下:

/**
 * The logic behind this algorithm is fairly simple and inefficient.
 * Basically we want to start with the first values of each of the k
 * vectors, pick the smallest value and push it to our finalList vector.
 * We then need to be looking at the next value of the vector we took the
 * value from so we don't keep taking the same value. A vector of vector
 * iterators is used to hold our position in each vector. While all iterators
 * are not at the .end() of their corresponding vector, we maintain a minValue
 * variable initialized to INT_MAX, and a minValueIndex variable and iterate over
 * each of the k vector iterators and if the current iterator is not an end position
 * we check to see if it is smaller than our minValue. If it is, we update our minValue
 * and set our minValue index (this is so we later know which iterator to increment after
 * we iterate through all of them). We do a check after our iteration to see if minValue
 * still equals INT_MAX. If it has, all iterators are at the .end() position, and we have
 * exhausted every vector and can stop iterative over all k of them. Regarding the complexity
 * of this method, we are iterating over `k` vectors so long as at least one value has not been
 * accounted for. Since there are `nk` values where `n` is the average number of elements in each
 * list, the time complexity = O(nk^2) like our other naive method.
 */
vector<int> mergeInefficientV2(const vector<vector<int> >& multiList) {
  vector<int> finalList;
  vector<vector<int>::const_iterator> iterators(multiList.size());

  // Set all iterators to the beginning of their corresponding vectors in multiList
  for (int i = 0; i < multiList.size(); ++i) iterators[i] = multiList[i].begin();

  int k = 0, minValue, minValueIndex;

  while (1) {
    minValue = INT_MAX;
    for (int i = 0; i < iterators.size(); ++i){
      if (iterators[i] == multiList[i].end()) continue;

      if (*iterators[i] < minValue) {
        minValue = *iterators[i];
        minValueIndex = i;
      }
    }

    iterators[minValueIndex]++;

    if (minValue == INT_MAX) break;
    finalList.push_back(minValue);
  }

  return finalList;
}

随机模拟

长话短说，我构建了一个简单的随机模拟来构建多维 vector<vector<int>> 。多维 vector 以每个大小为 2 的 2 vector 开始，以每个大小为 600 的 600 vector 结束。每个 vector 都经过排序，每次迭代，较大的容器和每个子 vector 的大小增加两个元素。我计算每个算法执行这样的时间需要多长时间:

clock_t clock_a_start = clock();
finalList = mergeInefficient(multiList);
clock_t clock_a_stop = clock();

clock_t clock_b_start = clock();
finalList = mergeInefficientV2(multiList);
clock_t clock_b_stop = clock();

然后我构建了以下图:

我的计算表明这两个简单的解决方案(合并和选择)都具有相同的时间复杂度，但上图显示它们非常不同。起初我通过说一个和另一个可能有更多的开销来合理化这一点，但后来意识到开销应该是一个常数因子，而不是产生如下图所示的图。对此有何解释？我认为我的复杂性分析是错误的？

最佳答案

即使两种算法具有相同的复杂性(在您的情况下为 O(nk^2))，根据您的输入大小和所涉及的“常数”因素，它们最终的运行时间也可能大不相同。

例如，如果一个算法在 n/1000 时间运行而另一个算法在 1000n 时间运行，它们都具有相同的渐近复杂度，但是对于 n 的“合理”选择，它们的运行时间将非常不同。

此外，缓存、编译器优化等可能会显着改变运行时间。

对于您的情况，虽然您对复杂性的计算似乎是正确的，但在第一种情况下，实际运行时间应为 (nk^2 + nk)/2 ，而在第二种情况下，运行时间应为 nk^2 。请注意，除以 2 可能很重要，因为随着 k 增加，nk 项可以忽略不计。

对于第三种算法，您可以通过维护包含所有 k vector 的第一个元素的 k 元素堆来修改 Naive 选择。然后您的选择过程将花费 O(logk) 时间，因此复杂性将减少到 O(nklogk) 。

关于c++ - 合并 K 个排序数组/vector 的复杂性，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/39197996/

Vector

c++ - 合并 K 个排序数组/vector 的复杂性