问题描述
我必须在每种情况下使用来自数学公式的浮点数创建和填充巨大的(例如 96 Go,72000 行 * 72000 列)数组.数组将在之后计算.
I have to create and fill huge (e.g. 96 Go, 72000 rows * 72000 columns) array with floats in each case that come from mathematical formulas. The array will be computed after.
import itertools, operator, time, copy, os, sys
import numpy
from multiprocessing import Pool
def f2(x): # more complex mathematical formulas that change according to values in *i* and *x*
temp=[]
for i in combine:
temp.append(0.2*x[1]*i[1]/64.23)
return temp
def combinations_with_replacement_counts(n, r): #provide all combinations of r balls in n boxes
size = n + r - 1
for indices in itertools.combinations(range(size), n-1):
starts = [0] + [index+1 for index in indices]
stops = indices + (size,)
yield tuple(map(operator.sub, stops, starts))
global combine
combine = list(combinations_with_replacement_counts(3, 60)) #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
t1=time.time()
pool = Pool() # start worker processes
results = [pool.apply_async(f2, (x,)) for x in combine]
roots = [r.get() for r in results]
print roots [0:3]
pool.close()
pool.join()
print time.time()-t1
- 创建和填充如此巨大的 numpy 数组的最快方法是什么?填充列表然后聚合然后转换为 numpy 数组?
- 我们可以在知道案例/列/行的情况下并行化计算吗?2d-array 是独立的以加速数组的填充?使用多处理优化此类计算的线索/线索?
推荐答案
我知道您可以创建共享的 numpy 数组,这些数组可以从不同的线程进行更改(假设更改的区域不重叠).这是您可以用来执行此操作的代码草图(我在 stackoverflow 的某处看到了原始想法,请这里是 https://stackoverflow.com/a/5550156/1269140 )
I know that you can create shared numpy arrays that can be changed from different threads (assuming that the changed areas don't overlap). Here is the sketch of the code that you can use to do that (I saw the original idea somewhere on stackoverflow, edit: here it is https://stackoverflow.com/a/5550156/1269140 )
import multiprocessing as mp ,numpy as np, ctypes
def shared_zeros(n1, n2):
# create a 2D numpy array which can be then changed in different threads
shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(n1, n2)
return shared_array
class singleton:
arr = None
def dosomething(i):
# do something with singleton.arr
singleton.arr[i,:] = i
return i
def main():
singleton.arr=shared_zeros(1000,1000)
pool = mp.Pool(16)
pool.map(dosomething, range(1000))
if __name__=='__main__':
main()
这篇关于创建和填充巨大的 numpy 二维数组的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!