x = np.arange(2000)
start = time.time()
y = da.from_array(x, chunks=(100))
for i in range (0,100):
p = y.dot(y)
print( time.time() - start)
start = time.time()
p = 0
for i in range (0,100):
p = np.dot(x,x)
print(time.time() - start)
的性能很大程度上取决于 BLAS库.
如果您拥有像OpenBLAS或MKL这样的现代化实现,则NumPy已经使用所有内核全速运行.在这种情况下, dask.array 可能只会妨碍您的操作,如果不保证进一步的并行性,则会导致线程争用.
I have big arrays to multiply in large number of iterations also.
I am training a model with array long around 1500 and I will perform 3 multiplications for about 1000000 times which takes a long time almost week.
I found Dask I tried to compare it with the normal numpy way but I found numpy faster:
x = np.arange(2000)
start = time.time()
y = da.from_array(x, chunks=(100))
for i in range (0,100):
p = y.dot(y)
print( time.time() - start)
start = time.time()
p = 0
for i in range (0,100):
p = np.dot(x,x)
print(time.time() - start)
Am I using dask wrong or it is numpy that fast ?
Performance for .dot
strongly depends on the BLAS library to which your NumPy implementation is linked.
If you have a modern implementation like OpenBLAS or MKL then NumPy is already running at full speed using all of your cores. In this case dask.array will likely only get in the way, trying to add further parallelism when none is warranted, causing thread contention.
If you have installed NumPy through Anaconda then you likely already have OpenBLAS or MKL, so I would just be happy with the performance that you have and call it a day.
However, in your actual example you're using chunks that are far too small (chunks=(100,)
). The dask task scheduler incurs about a millisecond of overhead per task. You should choose a chunksize so that each task takes somewhere in the 100s of milliseconds in order to hide this overhead. Generally a good rule of thumb is to aim for chunks that are above a megabyte in size. This is what is causing the large difference in performance that you're seeing.