问题描述
我想知道如何从多线程应用程序获得最大的性能。
我有一个线程池,我创建这样:
c> c> c> 调用同步增量方法会慢得多。
线程资源,最常见的是内存。你拥有的线程越多,估计你的内存使用就越困难,并且可能影响GC的时序(很少见,但我已经看到它发生了)。
其次,你有调度理论。
- 通常使用 Threads / O操作。你不希望程序等待网络或硬盘驱动器,如果你可以使用你的CPU的其他任务
- 有一些好的书籍计划(不记得名称)可以帮助您设计高效的程序。在你提到的例子中,可能有额外的线程有意义的情况。例如如果您的任务没有确定性持续时间,则倾斜,并且您的平均响应时间很重要:假设您有2个核心和4个任务。任务A& B将需要1分钟,但C& D将需要10分钟。如果你运行这两个线程与C& D执行首先,你的总时间将是11分钟,但你的平均响应时间将是(10 + 10 + 11 + 11)/4=10.5分钟。如果你对4个线程执行,你的响应时间将是((1 + a)+(1 + a)+(10 + a)+(10 + a))/ 4 = 5.5 + a,其中 a 是调度等待时间近似。这是非常理论的,因为有很多变量没有解释,但可以帮助设计线程程序。 (同样在上面的例子中,由于你正在等待 Futures ,你很可能不在乎平均响应时间)
- 使用多个线程池时必须小心。使用多个池可能会导致死锁(如果在两个池之间引入依赖关系),并且难以优化(可以在池中创建争用并获得正确的大小)
- EDIT -
最后,如果有帮助,我对性能的看法是,资源:CPU,RAM,磁盘&网络。我试图找到哪个是我的瓶颈,并使用非饱和资源进行优化。例如,如果我有大量的空闲CPU和低内存,我可能压缩我的内存中的数据。如果我有很多磁盘I / O和大内存,缓存更多的数据。如果网络资源(不是实际的网络连接)很慢,使用许多线程来并行化。一旦您在关键路径上使资源类型饱和,并且无法使用其他资源来加速资源类型,则您已达到最高性能,您需要升级H / W以获得更快的结果。
I'm trying to figure out how I can get the maximum performance from a multithreaded app.
I have a thread pool which I created like this:
ExecutorService executor = Executors.newFixedThreadPool(8); // I have 8 CPU cores.
My question is, should I divide the work into only 8 runnables/callables, which is the same number as the threads in the thread pool, or should I divide it into say 1000000 runnables/callables?
for (int i = 0; i < 1000000; i++) { Callable<Long> worker = new MyCallable(); // Each worker does little work. Future<Long> submit = executor.submit(worker); } long sum = 0; for (Future<Long> future : list) sum += future.get(); // Much more overhead from the for loops
OR
for (int i = 0; i < 8; i++) { Callable<Long> worker = new MyCallable(); // Each worker does much more work. Future<Long> submit = executor.submit(worker); } long sum = 0; for (Future<Long> future : list) sum += future.get(); // Negligible overhead from the for loops
Dividing into 1000000 callables seems slower to me since there is the overhead of instantiating all these callables and collecting results from them in for loops. On the other hand If I have 8 callables this overhead is negligible. And since I have only 8 threads, I can't run 1000000 callables at the same time so there is no performance gain from there.
Am I right or wrong?
BTW I could test these cases but the operation is very trivial and I guess the compiler realizes that and makes some optimizations. So the result might be misleading. I want to know which approach is better for something like an image processing app.
There are two aspects to this question.
First you have the technical Java stuff. As you have a few answers about this, I 'll summarize to these basics:
- if you have N Cores, then N number of threads would give you the best results as long as each task is only CPU bound (i.e. no I/O involved)
- each Thread should do more work than what is required for the task, i.e. Having N Threads counting to 10 would be much slower as the overhead of creating and managing the extra Threads is higher than the benefit of counting to 10 in parallel
- you need to make sure that any synchronization overhead is lower than the work being done i.e. Having N Threads calling a synchronized increment methods would be much slower
- Threads do take up resources, most commonly memory. The more threads you have, the more difficult it becomes to estimate you memory usage and might affect GC timing (rare but I've seen it happen)
Secondly you have the scheduling theory. You need to consider what is your program doing
- Typically use Threads for blocking I/O operations. You don't want you program to wait for network or HDD if you could be using your CPU for other tasks
- There are a few good books on scheduling (can't remember the names) that can help you design efficient programs. In the example you mention, there might be cases that extra threads would make sense. e.g. If your tasks don't have a deterministic duration, are skewed and your average response time is important: Assume you have 2 core and 4 tasks. Task A & B will take 1 minute each but C & D will take 10 minutes. If you run run these against 2 threads with C & D executing first, your total time will be 11 minutes but your average response time will be (10+10+11+11)/4=10.5 minutes. If you execute against 4 Threads then your the response time will be ((1+a)+(1+a)+(10+a)+(10+a))/4=5.5+a, where a is the scheduling waiting time approximation. This is very theoretical because there are many variables not explained, but can help in designing threaded programs. (Also in the example above, since you are waiting on the Futures you most likely don't care about average response times)
- Care must be taken when using multiple Thread pools. Using multiple pools can cause deadlocks (if dependencies are introduced among the two pools) and make it hard to optimize (contention can be created among the pools and getting the sizes right might become impossible)
--EDIT--
Finally, if it helps, the way I think about performance is that I have 4 primary resources: CPU, RAM, Disk & Network. I try to find which is my bottleneck and use non-saturated resources to optimize. For example, if I have lots of idle CPU and low memory, I might compress my in-memory data. If I have lots of disk I/O and large memory, cache more data. If network resources (not the actual network connection) are slow use many threads to parallelize. Once you saturate a resource type on your critical path and can't use other resources to speed it up, you've reached your maximum performance and you need to upgrade your H/W to get faster results.
这篇关于哪个更快?更少的工作在更多的运行,或更多的工作在更少的runnables? (ExecutorService)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!