I just stumbled onto this SO question and was wondering if there would be any performance improvement if:
- 文件在不大于硬盘扇区大小的块中进行比较(1 / 2KB ,2KB或4KB)
- 并且比较是通过多线程完成的(或者甚至可以与.NET 4并行执行)
I imagine there being 2 threads: one that reads from the beginning of the file and another that reads from the end until they meet in the middle.
I understand in this situation the disk IO is going to be the slowest part but if the reads never have to cross sector boundries (which in my twisted imagination somehow eliminates any possible fragmentation overhead) then it may potentially reduce head moves hence resulting in better performance (maybe?).
当然,其他因素也可以起到作用,例如单个vs多个处理器/核心或SSD vs非SSD,是磁盘IO速度+潜在共享处理器时间不可逾越?或者我的计算机理论的概念是完全偏离...
Of course other factors could play in as well, such as, single vs multiple processors/cores or SSD vs non-SSD, but with those aside; is the disk IO speed + potentially sharing processor time insurmountable? Or perhaps my concept of computer theory is completely off-base...
如果你比较两个文件同一驱动器,您可以从多线程接收的唯一的好处是有一个线程读取 - 填充下一个缓冲区 - 而另一个线程正在比较以前读取的缓冲区。
If you're comparing two files that are on the same drive, the only benefit you could receive from multi-threading is to have one thread reading--populating the next buffers--while another thread is comparing the previously-read buffers.
如果您正在比较的文件在不同的物理驱动器上,那么您可以同时执行两个异步读取 - 每个驱动器上有一个。
If the files you're comparing are on different physical drives, then you can have two asynchronous reads going concurrently--one on each drive.
But your idea of having one thread reading from the beginning and another reading from the end will make things slower because seek time is going to kill you. The disk drive heads will continually be seeking from one end of the file to the other. Think of it this way: do you think it would be faster to read a file sequentially from the start, or would it be faster to read 64K from the front, then read 64K from the end, then seek back to the start of the file to read the next 64K, etc?
Fragmentation is an issue, to be sure, but excessive fragmentation is the exception, not the rule. Most files are going to be unfragmented, or only partially fragmented. Reading alternately from either end of the file would be like reading a file that's pathologically fragmented.
请记住,典型的磁盘驱动器一次只能满足一个I / O请求。
Remember, a typical disk drive can only satisfy one I/O request at a time.
进行单扇区读取可能会减慢速度。在我的.NET I / O速度测试中,一次读取32K比一次读取4K快得多(在10%和20%之间)。我记得(这是一段时间,因为我这样做),在我的机器在当时,顺序读取的最佳缓冲区大小是256K。根据处理器速度,磁盘控制器,硬盘驱动器和操作系统版本,这对于每台机器无疑是不同的。
Making single-sector reads will probably slow things down. In my tests of .NET I/O speed, reading 32K at a time was significantly faster (between 10 and 20 percent) than reading 4K at a time. As I recall (it's been some time since I did this), on my machine at the time, the optimum buffer size for sequential reads was 256K. That will undoubtedly differ for each machine, based on processor speed, disk controller, hard drive, and operating system version.