问题描述
我的应用程序中具有以下工作流程:可能有来自X个用户的请求(通常是5-10个),这些用户想要在系统中进行搜索(每个请求都在单独的线程中处理).
I have a following workflow in my application: there can be X requests from users (usually 5-10 simultaneously) who want to search for something in the system (each request is handled in a separate thread).
每个搜索都可以并行处理(我目前正在实现).这里的线程/CPU使用率并不是真正的问题,因为这些任务并不占用大量CPU资源.数据库是瓶颈.
Each search can be handled in parallel (which I am currently implementing). Threads/CPU usage isn't really the problem here as those tasks aren't CPU intensive. The database is the bottleneck.
当前,我仅为搜索机制设置了一个单独的数据库连接池-最大池大小设置为10.我知道虽然不多,但不能将其设置得更高.现在,我试图弄清楚如何为每个用户(每个用户)设置线程池.
Currently I set up a separate DB connection pool only for the search mechanism - with max pool size set to 10. I know that's not much but I can't set it up higher. Now I am trying to figure out how to set the thread pool for each search (per user).
每个请求(线程)将产生一个单独的线程池(并且在该池中,每个线程将处理给定用户搜索的一部分).如果将例如10个用户一次单击搜索"按钮,将这个线程池的大小设置为固定大小(让我们说4个)确实是有问题的,因为它将产生10个线程池,每个线程池有4个线程= 40个线程,而只有10个线程池中的数据库连接?我想其中一些线程将只是空闲的,其余线程将争分夺秒地为池建立连接,但这真的会是一个大问题吗?
Each request (thread) will spawn a separate thread pool (and in this pool each thread will handle a part of a given user's search). Will setting this thread pool's size to a fixed size (lets say 4) be really problematic if for instance 10 users hit the "search" button at once since it would spawn 10 thread pools with 4 threads each = 40 threads and there are only 10 db connections in the pool? I guess some of the threads would simply be idle and the rest would race to get a connection for the pool but would that really be a huge problem?
如果是,那么最佳的做法是:
If yes then what would be the best course of action:
- 在创建线程库时检查已经有多少个线程池新线程池,并相应地调整其(此新池的)最大线程池大小(例如已经有2个池,每个池有4个线程,那么新的池将创建时将最大线程数设置为2,甚至可以说只有1个最大线程数是更新的池).这将意味着每个下一个用户的搜索都将明显变慢.
- 创建具有相同最大线程池大小(即4)的线程池,但实现我自己的线程池,该线程池将动态检查应用程序中有多少线程,并相应地调整maxThreadPoolSize的大小(在这种情况下,例如所有线程池, 2个旧线程和一个新线程,将缩小为3个线程).这将要求每个线程池访问某些共享对象,这些共享对象包含有关应用程序中所有线程池的信息.
- 还有别的吗?
- checking how many thread pools there already are while creating anew one and adjusting its (this new pool's) max thread pool size accordingly (saythere are already 2 pools with 4 threads each then the new one wouldbe created with max threads set to 2 and even newer pools with lets say only 1 max thread). This would mean that each next user's search would be substantially slower.
- creating the thread pool with the same max thread pool size (i.e. 4) but implement my own thread pool which would dynamically check how many threads there are in the application and resize maxThreadPoolSize accordingly (in this case for instance all thread pools, 2 old ones and the new one, would be downsized to lets say 3 thread). This would require each thread pool access to some shared object containing information about all thread pools in the application.
- something else?
感谢您的所有评论/答案.为了澄清为什么我希望每个请求都有一个线程池-这样做是为了使一个用户不会用完整个线程池.流程完全像这样:当用户单击搜索"时,将生成一个对象列表(此列表的范围可以从1个项目到数千个项目),然后针对每个项目执行一次DB查找.现在,所有步骤均按顺序执行.更改后,每个任务都会处理一次查找(因为在DB上的搜索非常慢,这给了我很大的推动-我知道我可以尝试对DB进行一些微调,但我不负责).
Thanks for all the comments/answers. To clarify why I wanted a thread pool per request - that was done so one user would not use up the whole threadpool. The flow is exactly like this: when a user hits "search" a list of objects is generated (this list can be ranging from 1 item up to thousands), then for each item a DB lookup is performed. Now it is all performed sequentially. After my changes each task handles one lookup (because the search on DB is pretty slow it gives me a really huge boost - I know I could try to do some DB fine tuning but I'm not in charge of it).
问题在于,如果我User1出现并对X千个生成的项目执行了真正的通用搜索,则可能需要几分钟(或更长时间).因此,我可以从一个用户的执行器中执行数千个任务.然后,如果我有一个共享线程池,可以说最多10个线程(与连接池相同的数目),则此请求将放入线程池的Queue中.现在,如果User2来执行他的搜索,则他将不得不等待User1的搜索完成,因为他的搜索将被放入同一队列中.我想通过每个请求使用线程池来避免这种情况.
The problem is that if I User1 comes and performs a really generic search for X thousand generated items it can take several minutes (or more). So I can have thousands of tasks in the executor from a single user. Then if I have a shared thread pool with lets say max 10 threads (the same number as connection pool) this request will be put in thread pool's Queue. Now if User2 comes and performs his search he will have to wait for User1's search to finish as his search will be put into that same Queue. This is a situation I want to avoid with thread pool per request.
我并不是真的担心上下文切换,因为每次计算可能需要花费几秒钟的时间,因此它们不会经常出现.
I'm not really that afraid of context switches as each computation can take up to several seconds so they won't occur that often.
当前,我正在考虑一个共享线程池和一个管理器,每个用户线程将向其发送数据,然后只要有空闲线程,该管理器便会将其发送到线程池.通过这种方式,我可以实现它(管理器)以从不同的用户发送任务(即,没有一个用户可以控制线程池).
Currently I'm thinking about a shared thread pool and a manager to which each user thread would send his data and that manager would then send it to the thread pool whenever there would be an idle thread. This way I could implement it (the manager) to send tasks from different users (i.e. no one user would not dominate the thread pool).
我看到的这种方法的问题是,我需要以某种方式告知父线程"(即用户请求)其所有任务均由管理器处理并以某种方式发送结果.
The problem I see with such an approach is that I would need to inform somehow the "parent thread" (meaning user request) that all its tasks were processed by the manager and send it the results somehow.
推荐答案
现代处理器可以轻松处理数百个线程而没有问题,但是正如@PeterLawrey所建议的那样,您的设计有些奇怪.如您所述,如果该操作在计算上并不昂贵,那么拥有大量线程将导致大量昂贵的上下文切换,从而导致性能下降.
Modern processors are easily able to handle with hundred of threads with no problem, but as @PeterLawrey has suggested there is something strange with your design. If, as you said, the operation are not computationally expensive, having a very high number of threads will result in a high number of expensive context switches which result in performance degradation.
额外的复杂性来自于以下事实:您希望每个请求都有一个线程池,而连接池是针对每个应用程序的:
The additional complexity comes from the fact that you want to have a thread-pool for each request, while the connection pool is per application :
- 如果每个请求都有一个线程池,则必须创建它并在每次收到新请求时销毁它.
- 无论您拥有数万亿个线程和预算为100000美元的超级计算机,最多可以有10个线程在做有用的工作.
您的直觉现在应该告诉您问题是您希望每个请求都有一个线程池,而理想的解决方案是在请求之间有一个共享线程池,线程数等于您的连接池大小.这样可以最大程度地重用线程.
Your intuition should now tell that the problem is that you want a thread-pool for each request, while the ideal solution is to have a shared thread pool among the requests, with the numbers of threads equal to your connection pool size. This will maximize thread re-usage.
如果您还希望避免单个请求占用您的全部计算能力,则可能需要添加一个层,该层确定谁有权安排额外的工作.考虑到每个请求的线程池解决方案,您让调度程序为您执行此操作,但这不是一个好主意,因为您无法控制算法.
If you also want to avoid a single request taking all of your computing power, you might want to add a layer which decides who has the right to schedule extra work. With the thread-pool per request solution you where thinking about, you were letting the scheduler do that for you, and this is not a good idea because you do not control the algorithm.
相反,您可以实现自己的公平算法",例如,通过PriorityBlockingQueue将块数量较少的项目移至顶部,或者使用ConcurrentHashMap为每个用户存储要调度的作业列表和一个ConcurrentHashMap.谁已经回来等等.
Instead, you can implement your own "fair algorithm" for example through a PriorityBlockingQueue where items with lower number of chuncks go to the top, or with a ConcurrentHashMap where you store for each user the list of jobs to schedule and the one who have already returned and so on.
这篇关于动态调整大小的线程池的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!