问题描述
我在python中遇到了这个问题:
I'm having this problem in python:
- 我有需要不时检查的URL队列
- 如果队列已满,我需要处理队列中的每个项目
- 队列中的每个项目都必须由一个进程处理(多处理)
到目前为止,我设法做到了这样的手动"操作:
So far I managed to achieve this "manually" like this:
while 1:
self.updateQueue()
while not self.mainUrlQueue.empty():
domain = self.mainUrlQueue.get()
# if we didn't launched any process yet, we need to do so
if len(self.jobs) < maxprocess:
self.startJob(domain)
#time.sleep(1)
else:
# If we already have process started we need to clear the old process in our pool and start new ones
jobdone = 0
# We circle through each of the process, until we find one free ; only then leave the loop
while jobdone == 0:
for p in self.jobs :
#print "entering loop"
# if the process finished
if not p.is_alive() and jobdone == 0:
#print str(p.pid) + " job dead, starting new one"
self.jobs.remove(p)
self.startJob(domain)
jobdone = 1
但是,这会导致大量的问题和错误.我想知道我是否更适合使用进程池".正确的方法是什么?
However that leads to tons of problems and errors. I wondered if I was not better suited using a Pool of process. What would be the right way to do this?
但是,很多时候我的队列是空的,并且每秒可以填充300个项目,所以我不太确定如何在这里做事情.
However, a lot of times my queue is empty, and it can be filled by 300 items in a second, so I'm not too sure how to do things here.
推荐答案
您可以使用 queue
以在启动时产生多个进程(使用 multiprocessing.Pool
),然后让它们休眠,直到队列上有一些数据可供处理为止.如果您不熟悉此功能,则可以尝试使用该简单程序玩":
You could use the blocking capabilities of queue
to spawn multiple process at startup (using multiprocessing.Pool
) and letting them sleep until some data are available on the queue to process. If your not familiar with that, you could try to "play" with that simple program:
import multiprocessing
import os
import time
the_queue = multiprocessing.Queue()
def worker_main(queue):
print os.getpid(),"working"
while True:
item = queue.get(True)
print os.getpid(), "got", item
time.sleep(1) # simulate a "long" operation
the_pool = multiprocessing.Pool(3, worker_main,(the_queue,))
# don't forget the coma here ^
for i in range(5):
the_queue.put("hello")
the_queue.put("world")
time.sleep(10)
这将产生3个进程(除了父进程之外).每个孩子都执行worker_main
函数.这是一个简单的循环,每次迭代都会从队列中获取新项目.如果没有准备好的工作,工作人员将阻止.
This will spawn 3 processes (in addition of the parent process). Each child executes the worker_main
function. It is a simple loop getting a new item from the queue on each iteration. Workers will block if nothing is ready to process.
在启动时,所有3个进程都将休眠,直到队列中填充了一些数据.当有可用数据时,等待中的工作人员之一将获得该项目并开始处理它.之后,它尝试从队列中获取其他项目,如果没有可用的内容,则再次等待...
At startup all 3 process will sleep until the queue is fed with some data. When a data is available one of the waiting workers get that item and starts to process it. After that, it tries to get an other item from the queue, waiting again if nothing is available...
这篇关于填充队列并在python中管理多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!