问题描述
我经常需要对大型的numpy数组(数十亿个元素)进行排序,这成为了我的代码的瓶颈.我正在寻找一种并行化它的方法.
I often need to sort large numpy arrays (few billion elements), which became a bottleneck of my code. I am looking for a way to parallelize it.
ndarray.sort()
函数是否有任何并行实现? Numexpr模块为numpy数组上的大多数数学运算提供并行实现,但缺少排序功能.
Are there any parallel implementations for the ndarray.sort()
function? Numexpr module provides parallel implementation for most math operations on numpy arrays, but lacks sorting capabilities.
也许可以围绕C ++并行排序实现一个简单的包装,并通过Cython使用它?
Maybe, it is possible to make a simple wrapper around a C++ implementation of parallel sorting, and use it through Cython?
推荐答案
我最终包装了GCC并行排序.这是代码:
I ended up wrapping GCC parallel sort. Here is the code:
parallelSort.pyx
parallelSort.pyx
# cython: wraparound = False
# cython: boundscheck = False
import numpy as np
cimport numpy as np
import cython
cimport cython
ctypedef fused real:
cython.char
cython.uchar
cython.short
cython.ushort
cython.int
cython.uint
cython.long
cython.ulong
cython.longlong
cython.ulonglong
cython.float
cython.double
cdef extern from "<parallel/algorithm>" namespace "__gnu_parallel":
cdef void sort[T](T first, T last) nogil
def numpyParallelSort(real[:] a):
"In-place parallel sort for numpy types"
sort(&a[0], &a[a.shape[0]])
额外的编译器参数:-fopenmp(编译)和-lgomp(链接)
Extra compiler args: -fopenmp (compile) and -lgomp (linking)
此makefile将执行此操作:
This makefile will do it:
all:
cython --cplus parallelSort.pyx
g++ -g -march=native -Ofast -fpic -c parallelSort.cpp -o parallelSort.o -fopenmp `python-config --includes`
g++ -g -march=native -Ofast -shared -o parallelSort.so parallelSort.o `python-config --libs` -lgomp
clean:
rm -f parallelSort.cpp *.o *.so
这表明它有效:
from parallelSort import numpyParallelSort
import numpy as np
a = np.random.random(100000000)
numpyParallelSort(a)
print a[:10]
修复了以下评论中发现的错误
edit: fixed bug noticed in the comment below
这篇关于numpy数组的并行就地排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!