问题描述
我正在尝试使用 来加速我的代码线程池
.
I'm trying to speed up my code using ThreadPool
.
输入是一个字典,大约有 80000 个元素,每个元素是一个包含 25 个元素的列表.我必须通过处理和组合每个列表中的元素为字典中的每个元素生成一个输出列表.
The input is a dictionary with about 80000 elements each one of which is a list of 25 elements. I have to produce an output list for each element in the dictionary, by processing and combining the elements in each list.
所有列表都可以独立分析,所以这个设置应该很容易并行化.
All the lists can be analyzed independently, so this setting should be easily parallelizable.
这是我用于 pool.map
的基本设置:
Here's the basic setting I used for pool.map
:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(NUM_THREADS)
output = pool.map(thread_compute, iterable_list)
pool.close()
pool.join
方法 1(错误):将字典定义为全局并让每个线程获取字典的键作为输入.
Approach 1 (wrong): define the dictionary as global and have each thread get a key of the dictionary as input.
# in the main global input_dict iterable_list = input_dict.keys() # thread_compute function def thread_compute(key_i): list_to_combine = input_dict[key_i] # call a function processed_list = do_stuff_function(list_to_combine) return processed_list
方法 2(不起作用):我创建了一个列表,其中包含与
input_dict
相同的元素,其中每个条目是一个包含 25 个项目的列表.这个列表不是一个全局变量,每个线程都应该能够访问一个元素(一个 25 项的列表),而不会因为GIL
而产生任何开销.Approach 2 (not working): I created a list containing the same elements as
input_dict
where each entry is a list of 25 items. This list is not a global variable and each thread should be able to access an element (a 25 item list) without any overhead due to theGIL
.
我很快意识到方法 1 将不起作用,因为全局变量
input_dict
,即使它从未被访问用于写操作(因此应该是线程安全的),但它受到GIL 的保护
(链接 1、link 2) - 尝试从单独的线程中安全访问 Python 对象时的全局强制锁定.I soon realized approach 1 will not work because of the global variable
input_dict
, even though it's never accessed for write operations (and so should be thread safe), is protected by theGIL
(link 1, link 2) - a globally enforced lock when trying to safely access Python objects from within seperate threads.# in the main items = list(input_dict.items()) iterable_list = [items[i][1] for i in range(len(items))] # making sure the content is correct assert(len(iterable_list) == len(input_dict)) for i in xrange(len(input_dict.keys())): assert(iterable_list[i] == input_dict[input_dict.keys()[i]]) # thread_compute function def thread_compute(list_of_25_i): # call a function processed_list = do_stuff_function(list_of_25_i) return processed_list
以下是 1、2、4 和 16 个线程的执行时间:
Here are the execution times for 1, 2, 4 and 16 threads:
1: Done, (t=36.6810s). 2: Done, (t=46.5550s). 4: Done, (t=48.2722s). 16: Done, (t=48.9660s).
为什么添加线程会导致时间增加?我肯定这个问题可以从多线程中受益,并且认为增加线程的开销不能单独负责增加.
Why does adding threads cause such an increase in time? I am definitely sure that this problem can benefit from multithreading, and think that the overhead of adding threads cannot be solely responsible of the increase.
推荐答案
如果你的
do_stuff_function
是 CPU 密集型的,那么在多线程中运行它也无济于事,因为 GIL 只允许 1 个线程一次执行.If your
do_stuff_function
is CPU-bound, then running it in multiple thread will not help, because the GIL only allows 1 thread to be executed at a time.在Python中解决这个问题的方法是使用多个进程,只需替换
The way around this in Python is to use multiple process, just replace
from multiprocessing.dummy import Pool
与
from multiprocessing import Pool
这篇关于我的 Python 多线程代码是否受全局解释器锁的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!