问题描述
我解析了一个很大的源代码目录(100k 个文件).我遍历每个文件中的每一行并进行一些简单的正则表达式匹配.我尝试将此任务线程化到多个线程,但没有得到任何加速.只有多处理设法将时间缩短了 70%.我知道 GIL 的死亡控制,但线程不应该帮助 IO 绑定访问吗?
I parse a big source code directory (100k files). I traverse every line in every file and do some simple regex matching. I tried threading this task to multiple threads but didn't get any speedup. Only multiprocessing managed to cut the time by 70%. I'm aware of the GIL death grip, but aren't threads supposed to help with IO bound access?
如果磁盘访问是串行的,为什么几个进程可以更快地完成工作?
If the disk access is serial, how come several processes finish the job quicker?
推荐答案
Python线程"允许独立线程执行,但通常不允许并发,因为全局解释器锁:一次只能真正运行一个线程.这可能就是为什么您只能通过多个进程获得加速,这些进程不共享全局解释器锁,因此可以并发运行.
Python "threads" permit independent threads of execution, but typically do not permit concurrency because of the global interpreter lock: only one thread can really be running at a time. This may be the reason why you only get a speedup with multiple processes, which do not share a global interpreter lock and thus can run concurrently.
这篇关于python:磁盘绑定任务,线程与进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!