utorService上的Fibonacci顺序运行比并行运行更

utorService上的Fibonacci顺序运行比并行运行更

本文介绍了Java ExecutorService上的Fibonacci顺序运行比并行运行更快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Java执行服务,并编写了以下代码来运行Fibonacci(是的,是大型递归版本,只是为了强调执行服务).

I am trying out the executor service in Java, and wrote the following code to run Fibonacci (yes, the massively recursive version, just to stress out the executor service).

令人惊讶的是,如果将nThreads设置为1,它将运行得更快.这可能与以下事实有关:提交给执行程序服务的每个任务"的大小确实很小.但是如果我将nThreads设置为1,它仍然必须是相同的数字.

Surprisingly, it will run faster if I set the nThreads to 1. It might be related to the fact that the size of each "task" submitted to the executor service is really small. But still it must be the same number also if I set nThreads to 1.

要查看对共享Atomic变量的访问是否可能导致此问题,我用注释"see text"注释了三行,并查看了系统监视器以查看执行花费了多长时间.但是结果是一样的.

To see if the access to the shared Atomic variables can cause this issue, I commented out the three lines with the comment "see text", and looked at the system monitor to see how long the execution takes. But the results are the same.

知道为什么会这样吗?

顺便说一句,我想将其与Fork/Join的类似实现进行比较.事实证明,它比F/J实施要慢得多.

BTW, I wanted to compare it with the similar implementation with Fork/Join. It turns out to be way slower than the F/J implementation.

public class MainSimpler {
    static int N=35;
    static AtomicInteger result = new AtomicInteger(0), pendingTasks = new AtomicInteger(1);
    static ExecutorService executor;

    public static void main(String[] args) {
        int nThreads=2;
        System.out.println("Number of threads = "+nThreads);
        executor = Executors.newFixedThreadPool(nThreads);
        Executable.inQueue = new AtomicInteger(nThreads);
        long before = System.currentTimeMillis();
        System.out.println("Fibonacci "+N+" is ... ");
        executor.submit(new FibSimpler(N));
        waitToFinish();
        System.out.println(result.get());
        long after = System.currentTimeMillis();
        System.out.println("Duration: " + (after - before) + " milliseconds\n");
    }

    private static void waitToFinish() {
        while (0 < pendingTasks.get()){
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        executor.shutdown();
    }
}



class FibSimpler implements Runnable {
    int N;
    FibSimpler (int n) { N=n; }

    @Override
    public void run() {
        compute();
        MainSimpler.pendingTasks.decrementAndGet(); // see text
    }

    void compute() {
        int n = N;
        if (n <= 1) {
            MainSimpler.result.addAndGet(n); // see text
            return;
        }
        MainSimpler.executor.submit(new FibSimpler(n-1));
        MainSimpler.pendingTasks.incrementAndGet(); // see text
        N = n-2;
        compute();  // similar to the F/J counterpart
    }
}

运行时(大约):

  • 1个线程:11秒
  • 2个线程:19秒
  • 4个线程:19秒

更新:我注意到,即使我在执行程序服务中使用一个线程,整个程序也将使用计算机的所有四个内核(每个内核平均使用率约为80%).这可以解释为什么在executor服务中使用更多线程会减慢整个过程,但是现在,如果在executor服务中只有一个线程处于活动状态,为什么该程序使用4个内核?

Update:I notice that even if I use one thread inside the executor service, the whole program will use all four cores of my machine (each core around 80% usage on average). This could explain why using more threads inside the executor service slows down the whole process, but now, why does this program use 4 cores if only one thread is active inside the executor service??

推荐答案

当然是这种情况,因此,您主要是在测量上下文切换的开销.当n == 1时,没有上下文切换,因此性能更好.

This is certainly the case and as a result you are mainly measuring the overhead of context switching. When n == 1, there is no context switching and thus the performance is better.

我猜你的意思是高于1".

I'm guessing you meant 'to higher than 1' here.

您遇到了锁争用过多的问题.当您有多个线程时,result上的锁定始终处于争用状态.线程必须等待彼此才能更新result,这会减慢它们的速度.当只有一个线程时,JVM可能会检测到并执行锁定清除,这意味着它实际上根本不执行任何锁定.

You are running into the problem of heavy lock contention. When you have multiple threads, the lock on the result is contended all the time. Threads have to wait for each other before they can update the result and that slows them down. When there is only a single thread, the JVM probably detects that and performs lock elision, meaning it doesn't actually perform any locking at all.

如果不将问题划分为N任务,而是将其划分为N/nThreads任务,则可能会获得更好的性能,这些任务可以由线程同时处理(假设您选择nThreads在最多可用的物理核心/线程数).然后,每个线程都会执行自己的工作,计算自己的总数,并且仅在完成线程后将其添加到总计中.即使这样,对于fib(35),我仍希望线程管理的成本超过收益.也许尝试fib(1000).

You may get better performance if you don't divide the problem into N tasks, but rather divide it into N/nThreads tasks, which can be handled simultaneously by the threads (assuming you choose nThreads to be at most the number of physical cores/threads available). Each thread then does its own work, calculating its own total and only adding that to a grand total when the thread is done. Even then, for fib(35) I expect the costs of thread management to outweigh the benefits. Perhaps try fib(1000).

这篇关于Java ExecutorService上的Fibonacci顺序运行比并行运行更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 06:50