本文介绍了使用多核的前缀搜索算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个任务是从单词中过滤一个列表(向量)作为前缀.该算法应该使用现代的多核处理器.

I got a task to filter a list (vector) from words for a prefix.The algorithm is supposed to use modern multi-core processors.

解决方案是使用许多线程来处理列表.

the solution have is to use many threads to handle the list.

//      PrintWriter writer = new PrintWriter("C:\\DemoList.txt", "UTF-8");
//
//      for(char i = 'A'; i<= 'Z'; i++) {
//          for(char j = 'A'; j<= 'Z'; j++) {
//              for(char n = 'A'; n<= 'Z'; n++) {
//                  for(char m = 'A'; m<= 'Z'; m++) {
//                      writer.println("" + i + j + n + m );
//                  }
//
//              }
//          }
//      }
    List<String> allLines = Files.readAllLines(Paths.get("C:\\", "DemoList.txt"));
    Collections.shuffle(allLines);
    String pattern = "AA";

    List<String> result = new ArrayList<>();
    int cores = Runtime.getRuntime().availableProcessors();
    int threadsNum = allLines.size() / cores;

    long start_time = System.nanoTime();

    for (String word : allLines) {
        if (word.startsWith(pattern))
            result.add(word);

    }

    long end_time = System.nanoTime();
    double difference = (end_time - start_time) / 1e6;
    System.out.println("Time difference in Milliseconds with Brute-Force: " + difference);

//With Parallisim:
    long new_start_time = System.nanoTime();

    List<String> filteredList = allLines.parallelStream().filter(s -> s.startsWith(pattern))
            .collect(Collectors.toList());

    long new_end_time = System.nanoTime();

    double new_difference = (new_end_time - new_start_time) / 1e6;
    System.out.println("Time difference in Milliseconds with Stream from Java 8: " + new_difference);

结果:蛮力时差(以毫秒为单位):33.033602Java 8中Stream的时差(以毫秒为单位):65.017069

Result:Time difference in Milliseconds with Brute-Force: 33.033602Time difference in Milliseconds with Stream from Java 8: 65.017069

每个线程都应从列表中过滤一个范围.

Each thread should filter a range from the list.

您有更好的主意吗?您是否认为我应该对原始列表进行排序,而不是对其进行二进制搜索?我是否还应该在二进制排序中使用多线程,还是应该使用Collections.sort?您将如何实施?

Do you have a better idea?Do you think, that i should sort the original list and than doing binary search on it? should i use multi-threading also in the binary sort, or shall i use the Collections.sort?How would you implement that?

推荐答案

在您的代码示例中,您的时间测量方法与Micro Benchmarking紧密相关,为此,仅测量一次执行的时间会产生误导.

From your code sample, your method of time measurement borders on Micro Benchmarking, for which simply measuring time for a single execution is misleading.

您可以在下面的StackOverflow帖子中详细讨论它:

You can see it discussed at length in the following StackOverflow post: How do I write a correct micro-benchmark in Java?

我写了一个更准确的基准,以更精确地测量您的示例代码.该代码已在具有多线程功能的QuadCore i7上运行(但是由于功耗和热量管理,它是一台笔记本电脑,结果可能与产生更多热量的多线程代码略有偏差).

I've written a more accurate benchmark to obtain a more precise measurment of your sample code. The code has run on a QuadCore i7 with multithreading (but it's a laptop, due to power and heat management, results may be slightly biased against multithreaded code that produces more heat).

@Benchmark
public void testSequentialFor(Blackhole bh, Words words) {
    List<String> filtered = new ArrayList<>();
    for (String word : words.toSort) {
        if (word.startsWith(words.prefix)) {
            filtered.add(word);
        }
    }
    bh.consume(filtered);
}

@Benchmark
public void testParallelStream(Blackhole bh, Words words) {
    bh.consume(words.toSort.parallelStream()
            .filter(w -> w.startsWith(words.prefix))
            .collect(Collectors.toList())
    );
}

@Benchmark
public void testManualThreading(Blackhole bh, Words words) {
    // This is quick and dirty, bit gives a decent baseline as to
    // what a manually threaded partitionning can achieve.
    List<Future<List<String>>> async = new ArrayList<>();
    // this has to be optimized to avoid creating "almost empty" work units
    int batchSize = words.size / ForkJoinPool.commonPool().getParallelism();
    int numberOfDispatchedWords = 0;
    while (numberOfDispatchedWords < words.toSort.size()) {
        int start = numberOfDispatchedWords;
        int end = numberOfDispatchedWords + batchSize;
        async.add(words.threadPool.submit(() -> {
            List<String> batch = words.toSort.subList(start, Math.min(end, words.toSort.size()));
            List<String> result = new ArrayList<>();
            for (String word : batch) {
                if (word.startsWith(words.prefix)) {
                    result.add(word);
                }
            }
            return result;
        }));
        numberOfDispatchedWords += batchSize;
    }
    List<String> result = new ArrayList<>();
    for (Future<List<String>> asyncResult : async) {
        try {
            result.addAll(asyncResult.get());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    bh.consume(result);
}

@State(Scope.Benchmark)
public static class Words {

    ExecutorService threadPool = ForkJoinPool.commonPool();

    List<String> toSort;

    @Param({"100", "1000", "10000", "100000"})
    private int size;

    private String prefix = "AA";

    @Setup
    public void prepare() {
        //a 4 to 13 letters long, random word
        //for more precision, it should not be that random (use a fixed seed), but given the simple nature of the fitlering, I guess it's ok this way
        Supplier<String> wordCreator = () -> RandomStringUtils.random(4 + ThreadLocalRandom.current().nextInt(10));
        toSort = Stream.generate(wordCreator).limit(size).collect(Collectors.toList());
    }
}

这是结果


Benchmark                     (size)   Mode  Cnt        Score       Error  Units
PerfTest.testManualThreading     100  thrpt    6    95971,811 ±  1100,589  ops/s
PerfTest.testManualThreading    1000  thrpt    6    76293,983 ±  1632,959  ops/s
PerfTest.testManualThreading   10000  thrpt    6    34630,814 ±  2660,058  ops/s
PerfTest.testManualThreading  100000  thrpt    6     5956,552 ±   529,368  ops/s
PerfTest.testParallelStream      100  thrpt    6    69965,462 ±   451,418  ops/s
PerfTest.testParallelStream     1000  thrpt    6    59968,271 ±   774,859  ops/s
PerfTest.testParallelStream    10000  thrpt    6    29079,957 ±   513,244  ops/s
PerfTest.testParallelStream   100000  thrpt    6     4217,146 ±   172,781  ops/s
PerfTest.testSequentialFor       100  thrpt    6  3553930,640 ± 21142,150  ops/s
PerfTest.testSequentialFor      1000  thrpt    6   356217,937 ±  7446,137  ops/s
PerfTest.testSequentialFor     10000  thrpt    6    28894,748 ±   674,929  ops/s
PerfTest.testSequentialFor    100000  thrpt    6     1725,735 ±    13,273  ops/s

因此,顺序版本在最多几千个元素的情况下看起来行进的速度更快,它们与手动线程在10k之前的性能相当,而并行流在之后的10k相当,并且线程代码的性能优于在那里.

So the sequential version looks way faster up to a few thousand elements, they are on par with manual threading a bit before 10k, and with parallel stream a bit after, and threaded code performs better from there on.

考虑到编写手动线程变体"所需的代码量,以及通过计算批处理大小而在其中创建错误或效率低下的风险,即使它看起来可以,我也可能不会选择该选项比巨大列表的流更快.

Considering the amount of code needed to write the "manual threading variant", and the risk of creating a bug there or an inefficiency by badling calculating batch size, I'd probably not elect that option even if it looks like it can be faster than streams for huge lists.

我不会先排序,然后进行二进制搜索,因为过滤是O(N)操作,然后对O(Nlog(N))进行排序(必须在其上添加二进制搜索).因此,除非您对数据有一个非常精确的模式,否则我认为它无法发挥您的优势.

I would not try to sort first, then binary search as filtering is a O(N) operation, and sorting a O(Nlog(N)) (on top of which you have to add a binary search). So unless you have a very precise pattern on the data I do not see it working at your advantage.

请注意,尽管没有得出结论,但该基准测试无法支持.一方面,它基于这样的假设:过滤是程序中唯一发生的事情,并且在争夺CPU时间.如果您使用的是任何类型的多用户"应用程序(例如Web应用程序),那可能就不正确了,尽管您可能会通过多线程获得好处,但是您很可能会失去所有东西.

Be aware though not to draw conclusion this benchmark can not support. For one thing, it is based on the assumption that the filtering is the only thing happening in the program and fighting for CPU time. If you are in any kind "multi-user" application (e.g. web application), then this is probably not true, and you may very well lose everything you though you would gain by multithreading.

这篇关于使用多核的前缀搜索算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 03:16