

我正在用D编写目录树扫描功能,该功能试图结合grep和file之类的工具,并且有条件地grep组合文件中的内容,前提是 not 不匹配一组指示字节的魔术字节文件类型,例如ELF,图像等.

I'm writing a directory tree scanning function in D that tries to combine tools such as grep and file and conditionally grep for things in a file only if it's not matching a set of magic bytes indicating filetypes such as ELF, images, etc.


What is the best approach to making such an exclusion logic run as fast as possible with regards to minimizing file io? I typically don't want to read in the whole file if I only need to read some magic bytes in the beginning. However to make the code more future-general (some magics may lie at the end or somewhere else than at the beginning) it would be nice if I could use a mmap-like interface to lazily fetch data from the disk only when I it is read. The array interface also simplifies my algorithms.


Is D's std.mmfile the best option in this case?

更新:根据这篇文章,我认为建议使用mmap: http://forum.dlang.org/thread/[email protected]

Update: According to this post I guess mmap is adviced: http://forum.dlang.org/thread/[email protected]


If I only need read-access as an array (opIndex) are there any cons to using std.mmfile over std.stdio.File or std.file?



If you want to lazily read a file with Phobos, you pretty much have three options

  1. 使用std.stdio.FilebyLine一次读取一行.


Use std.stdio.File's byChunk and read a particular number of bytes at a time.


Use std.mmfile.MmFile and operate on the file as an array, taking advantage of mmap underneath the hood to avoid reading in the whole file.


I fully expect that #3 is going to be the fastest (profiling could prove differently, but I'd be very surprised given how fantastic mmap is). It's also probably the easiest to use, because you get an array to operate on. The only problem with MmFile that I'm aware of is that it's a class when it should arguably be a ref-counted struct so that it would clean itself up when you were done. Right now, if you don't want to wait for the GC to clean it up, you'd have to manually call unmap on it or use destroy to destroy it without freeing its memory (though destroy should be used with caution). There may be some sort of downside to using mmap (which would then naturally mean that there was a downside to using MmFile), but I'm not aware of any.


In the future, we're going to end up with some range-based streaming I/O stuff, which might be closer to what you need without actually using mmap, but that hasn't been completed yet, and mmap is so incredibly cool that there's a good chance that it would still be better to use MmFile.


08-11 19:16