本文介绍了多线程读取Java中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建线程来读取java中的文件。当我创建2个线程时,每个线程读取整个文件,而我希望它们读取文件的不同部分。我尝试放入sleep(),join(),yield()但是在包含它们之后它只会减慢读取速度。

I am creating threads to read a file in java. When I create 2 threads, each thread reads the whole file while I want them to read different parts of file. I tried putting in sleep(), join(), yield() but after including them it is just slowing down the read.

public class MyClass implements Runnable {

    Thread thread;
    public MyClass(int numOfThreads) {
        for(int i=0;i < numOfThreads; i++) {
            thread = new Thread(this);
            thread.start();
        }
    }

    public void run() {
        readFile();
    }
}

在readFile中,在while循环中(逐行阅读)我调用了sleep()/ yield()。如何让线程读取文件的不同部分?

更新了用于读取文件的方法...

Updated with method used to read files...

public synchronized void readFile() {
    try {
        String str;
        BufferedReader buf = new BufferedReader(new FileReader("read.txt");
        while ((line = buf.readLine()) != null) {
            String[] info = str.split(" ");
            String first name = info[0];
            String second name = info[1];
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
            }
        }  catch (IOException e) {
        System.out.println("Error : File not found");
        e.printStackTrace();
    }
}


推荐答案

我认为你认为读取这样的多个线程的文件比读取一个文件更快。这几乎肯定是假的。线程在使用多个内核或处理器的CPU绑定任务上获得更好的性能。但是文件读取不是CPU绑定的任务。

I suppose you're thinking that reading a file with multiple threads like this will be faster than reading with one. This is almost certainly false. Threads get better performance on CPU-bound tasks using multiple cores or processors. But file reading is not a CPU-bound task.

操作系统使用磁盘控制器以全带宽读取字节数磁盘接口。对于几乎任何硬件组合,速度受磁盘(读取和/或寻道时间),其控制器以及其DMA接口或总线的限制,而不受CPU的限制。 CPU很容易让磁盘控制器保持100%忙碌,即使是不同磁盘的几个控制器也是如此。如果需要证明这一点,请启动大文件副本并观察CPU利用率。它不会很高。

The OS uses the disk controller to read bytes at the full bandwidth of the disk interface. For nearly any hardware combination, the speed is bounded by the disk (read and/or seek times), its controller, and its DMA interface or bus not by the CPU. It's easy for a CPU to keep the disk controller 100% busy, even several controllers for different disks. If you need proof of this, start a big file copy and watch CPU utilization. It won't be very high.

因此,在你的多个线程中,一次只运行一个,为单线程计算增加了开销。

Therefore, of your multiple threads, only one will run at a time, adding overhead to a single-threaded computation.

慢速文件传输的内容是缓冲。为了获得灵活性,i / o库最终可以缓冲每个角色2次甚至3次。

What does slow file transfers is buffering. To gain flexibility, i/o libraries can end up buffering each character 2 or even 3 times.

Java NIO库旨在尽可能地消除这些开销。例如,参见这篇文章。有很多类似的。我的经验是,经过精心编写的NIO读卡器将使用硬件的大部分可用性能。

The Java NIO library is meant to do away with as much of this overhead as possible. See for example this article. There are many similar ones. My experience is that a carefully written NIO reader will use most of the available performance of the hardware.

有一点需要注意:如果你有一个重型病毒检查器设置为扫描您正在阅读的文件类型,它可能会使读取CPU受限。在这种不寻常的情况下,根据检查器架构,您可能会通过多线程获得提升。在这种情况下,你会发现总文件大小为S,让线程k = 0,1,..,n-1从偏移量kS / n读取到(k + 1)S / n - 1(搜索转到正确的偏移量并跟踪每个线程中读取的字节数)。但是我仍然强烈怀疑额外的头部搜索时间和随机访问的其他影响将抵消在多个线程中运行病毒检查程序的任何优势。

There is one caveat: If you have a heavy duty virus checker set to scan the kind of file you are reading, it might possibly make reading CPU-bound. In this unusual case, you might possibly get a boost by multi-threading depending on the checker architecture. In this case you'd find the total file size S and let thread k=0,1,..,n-1 read from offset kS/n to (k+1)S/n - 1 (by seeking to the right offset and tracking numbers of bytes read in each thread). However I still strongly suspect that the the additional head seek time and other effects of random access will cancel out any advantage to running the virus checker in multiple threads.

这篇关于多线程读取Java中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 03:08