在串行情况下,排序需要O(n log n)。如果我们有O(n)个处理器,我们希望线性加速。存在O(log n)个并行算法,但是它们具有很高的常数。它们也不适用于没有O(n)处理器的商品硬件。对于p个处理器,合理的算法应花费O(n / p log n)时间。

在串行情况下,快速排序平均具有最佳的运行时复杂性。并行快速排序算法易于实现(请参阅herehere)。但是,由于最初的步骤是将整个集合划分在单个内核上,因此执行效果不佳。我已经找到了许多并行排序算法的信息,但到目前为止,我还没有发现任何指向明确赢家的信息。

我希望对运行在8到32个内核上的JVM语言中的100万到1亿个元素的列表进行排序。

最佳答案

以下文章(PDF下载)是对各种体系结构上的并行排序算法的比较研究:

Parallel sorting algorithms on various architectures

根据这篇文章,样本排序在许多并行体系结构类型上似乎是最好的。

更新以解决Mark对年龄的关注:

以下是一些较新的文章,介绍了一些更新颖的内容(从2007年开始,顺便说一下,仍然可以与样本排序进行比较):

Improvements on sample sort
AA-Sort

前沿(大约在2010年,有些才几个月):

Parallel sorting pattern
Many-core GPU based parallel sorting
Hybrid CPU/GPU parallel sort
Randomized Parallel Sorting Algorithm with an Experimental Study
Highly scalable parallel sorting
Sorting N-Elements Using Natural Order: A New Adaptive Sorting Approach

2013年更新:
这是大约在2013年1月的前沿。(注意:一些链接是指向Citeseer的论文的,需要免费注册):

大学讲座:
Parallel Partitioning for Selection and Sorting
Parallel Sorting Algorithms Lecture
Parallel Sorting Algorithms Lecture 2
Parallel Sorting Algorithms Lecture 3

其他来源和论文:

A novel sorting algorithm for many-core architectures based on adaptive bitonic sort
Highly Scalable Parallel Sorting 2
Parallel Merging
Parallel Merging 2
Parallel Self-Sorting System for Objects
Performance Comparison of Sequential Quick Sort and Parallel Quick Sort Algorithms
Shared Memory, Message Passing, and Hybrid Merge Sorts for Standalone and Clustered SMPs
Various parallel algorithms (sorting et al) including implementations

GPU和CPU / GPU混合来源和论文:

An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture
Data Sorting Using Graphics Processing Units
Efficient Algorithms for Sorting on GPUs
Designing efficient sorting algorithms for manycore GPUs
Deterministic Sample Sort For GPUs
Fast in-place sorting with CUDA based on bitonic sort
Fast parallel GPU-sorting using a hybrid algorithm
Fast Parallel Sorting Algorithms on GPUs
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
GPU sample sort
GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures
GPUTeraSort: high performance graphics co-processor sorting for large database management
High performance comparison-based sorting algorithm on many-core GPUs
Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead
Sorting on GPUs for large scale datasets: A thorough comparison

09-10 23:48