线程池,为什么要使用线程池:
1. 线程中可以获取某一个线程的状态或者某一个任务的状态,以及返回值
2. 当一个线程完成的时候我们主线程能立即知道
3. futures可以让多线程和多进程编码接口一致
获取状态或关闭任务
import time
def get_html(times):
time.sleep(times)
print("get page {} success".format(times))
return times
executor = ThreadPoolExecutor(max_workers=2)# 最大执行线程数
#通过submit函数提交执行的函数到线程池中, submit 是立即返回,不会造成主线程阻塞
task1 = executor.submit(get_html, (3))
task2 = executor.submit(get_html, (2))
print(task1.done()) # 判断任务是否完成
print(task2.cancel()) #取消任务,只能在任务没有开始的时候进行cancel
获取成功的task返回
方法一:
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
def get_html(times):
time.sleep(times)
print("get page {} success".format(times))
return times
executor = ThreadPoolExecutor(max_workers=2)
urls = [3, 2, 4]
all_task = [executor.submit(get_html, (url)) for url in urls]
for future in as_completed(all_task): # 返回已经成功的任务的返回值,不会按照线程执行的顺序返回
data = future.result()
print("get {} page".format(data))
方法二:
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
def get_html(times):
time.sleep(times)
print("get page {} success".format(times))
return times
executor = ThreadPoolExecutor(max_workers=2)
urls = [3, 2, 4]
#通过executor的map获取已经完成的task的值,这个会按照执行线程的顺序执行,会按照执行线程顺序返回
for data in executor.map(get_html, urls):
print("get {} page".format(data))
wait,wait第几个任务执行完了以后才执行主线程,不然的话主线程会一直阻塞
from concurrent.futures import ThreadPoolExecutor, wait
import time
def get_html(times):
time.sleep(times)
print("get page {} success".format(times))
return times
executor = ThreadPoolExecutor(max_workers=2)
urls = [3, 2, 4]
all_task = [executor.submit(get_html, (url)) for url in urls]
wait(all_task, return_when=FIRST_COMPLETED)
print("main")