《Java编程思想》中对内存映射文件有详细的介绍,此处仅做简单记录和总结。内存映射文件允许创建和修改因为太大而不能放入内存的文件。

1. 内存映射文件简单实例

import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class LargeMappedFiles {

    private static int LENGTH = 0x0000FFF;

    public static void main(String[] args) throws IOException{
        MappedByteBuffer out = new RandomAccessFile("test.dat", "rw")
          .getChannel() .map(FileChannel.MapMode.READ_WRITE,
0, LENGTH); for(int i = 0; i < LENGTH; i++) { out.put((byte)'x'); } for(int i = LENGTH/2; i < LENGTH/2 + 6; i++) { System.out.print((char)out.get(i)); } } }

输出:

xxxxxx
  • 通过RandomAccessFile类获取FileChannel,使其具备读写功能。
  • 通过FileChannel的map方法,获取MappedByteBuffer,该方法包含三个参数,MapMode映射类型、开始位置、映射总数量,意味着可以映射大文件的较小部分。
  • MappedByteBuffer是一个特殊的直接缓冲器,对该缓冲器的修改会反映到对应文件中;另外,其继承ByteBuffer,具有ByteBuffer的所有方法。

本例中首先创建MappedByteBuffer,并设置为读写模式;然后往缓冲器中写入字符x;最后在文件中间开始读取6个字符。

2. 内存映射文件源码

以下是FileChannel.map()方法的解释:

 /**
     * Maps a region of this channel's file directly into memory.
     *
     * <p> A region of a file may be mapped into memory in one of three modes:
     * </p>
     *
     * <ul>
     *
     *   <li><p> <i>Read-only:</i> Any attempt to modify the resulting buffer
     *   will cause a {@link java.nio.ReadOnlyBufferException} to be thrown.
     *   ({@link MapMode#READ_ONLY MapMode.READ_ONLY}) </p></li>
     *
     *   <li><p> <i>Read/write:</i> Changes made to the resulting buffer will
     *   eventually be propagated to the file; they may or may not be made
     *   visible to other programs that have mapped the same file.  ({@link
     *   MapMode#READ_WRITE MapMode.READ_WRITE}) </p></li>
     *
     *   <li><p> <i>Private:</i> Changes made to the resulting buffer will not
     *   be propagated to the file and will not be visible to other programs
     *   that have mapped the same file; instead, they will cause private
     *   copies of the modified portions of the buffer to be created.  ({@link
     *   MapMode#PRIVATE MapMode.PRIVATE}) </p></li>
     *
     * </ul>
     *
     * <p> For a read-only mapping, this channel must have been opened for
     * reading; for a read/write or private mapping, this channel must have
     * been opened for both reading and writing.
     *
     * <p> The {@link MappedByteBuffer <i>mapped byte buffer</i>}
     * returned by this method will have a position of zero and a limit and
     * capacity of <tt>size</tt>; its mark will be undefined.  The buffer and
     * the mapping that it represents will remain valid until the buffer itself
     * is garbage-collected.
     *
     * <p> A mapping, once established, is not dependent upon the file channel
     * that was used to create it.  Closing the channel, in particular, has no
     * effect upon the validity of the mapping.
     *
     * <p> Many of the details of memory-mapped files are inherently dependent
     * upon the underlying operating system and are therefore unspecified.  The
     * behavior of this method when the requested region is not completely
     * contained within this channel's file is unspecified.  Whether changes
     * made to the content or size of the underlying file, by this program or
     * another, are propagated to the buffer is unspecified.  The rate at which
     * changes to the buffer are propagated to the file is unspecified.
     *
     * <p> For most operating systems, mapping a file into memory is more
     * expensive than reading or writing a few tens of kilobytes of data via
     * the usual {@link #read read} and {@link #write write} methods.  From the
     * standpoint of performance it is generally only worth mapping relatively
     * large files into memory.  </p>
     *
     * @param  mode
     *         One of the constants {@link MapMode#READ_ONLY READ_ONLY}, {@link
     *         MapMode#READ_WRITE READ_WRITE}, or {@link MapMode#PRIVATE
     *         PRIVATE} defined in the {@link MapMode} class, according to
     *         whether the file is to be mapped read-only, read/write, or
     *         privately (copy-on-write), respectively
     *
     * @param  position
     *         The position within the file at which the mapped region
     *         is to start; must be non-negative
     *
     * @param  size
     *         The size of the region to be mapped; must be non-negative and
     *         no greater than {@link java.lang.Integer#MAX_VALUE}
     *
     * @return  The mapped byte buffer
     *
     * @throws NonReadableChannelException
     *         If the <tt>mode</tt> is {@link MapMode#READ_ONLY READ_ONLY} but
     *         this channel was not opened for reading
     *
     * @throws NonWritableChannelException
     *         If the <tt>mode</tt> is {@link MapMode#READ_WRITE READ_WRITE} or
     *         {@link MapMode#PRIVATE PRIVATE} but this channel was not opened
     *         for both reading and writing
     *
     * @throws IllegalArgumentException
     *         If the preconditions on the parameters do not hold
     *
     * @throws IOException
     *         If some other I/O error occurs
     *
     * @see java.nio.channels.FileChannel.MapMode
     * @see java.nio.MappedByteBuffer
     */
    public abstract MappedByteBuffer map(MapMode mode,
                                         long position, long size)
        throws IOException;
  • 该方法直接将通道对应文件的一部分映射到内存,并返回MappedByteBuffer
  • 有3种模式:READ_ONLY(只读)、READ_WRITE(读写)、PRIVATE(私有,用于copy-on-write)
  • MappedByteBuffer一旦建立,就与创建它的通道无关,即通道关闭时,不影响该缓冲器
  • 内存映射需要依赖于底层操作系统;另外,对大部分操作系统,内存映射要比直接读写昂贵,故一般都映射较大的文件。
  • 该方法的参数包括读写模式(由FileChannel内部类MapMode定义,如下)、开始位置position、映射大小size
/**
     * A typesafe enumeration for file-mapping modes.
     *
     * @since 1.4
     *
     * @see java.nio.channels.FileChannel#map
     */
    public static class MapMode {

        /**
         * Mode for a read-only mapping.
         */
        public static final MapMode READ_ONLY
            = new MapMode("READ_ONLY");

        /**
         * Mode for a read/write mapping.
         */
        public static final MapMode READ_WRITE
            = new MapMode("READ_WRITE");

        /**
         * Mode for a private (copy-on-write) mapping.
         */
        public static final MapMode PRIVATE
            = new MapMode("PRIVATE");

        private final String name;

        private MapMode(String name) {
            this.name = name;
        }

        /**
         * Returns a string describing this file-mapping mode.
         *
         * @return  A descriptive string
         */
        public String toString() {
            return name;
        }

    }

3. 文件加锁

JDK1.4引入文件加锁机制,允许同步访问共享资源文件。文件锁对其他操作系统进程是可见的,因为Java的文件加锁直接映射到本地操作系统的加锁工具。

可以通过FileChannel的tryLock()和lock()方法获取整个文件的FileLock。接口如下,tryLock()是非阻塞的,如果不能获取锁,则返回null;lock()是阻塞的,一直等待文件锁。另外,SocketChannel、DatgramChannel、ServerSocketChannel不需要加锁,因为他们是从单进程实体继承而来;并且通常不会在两个进程间共享socket。

public abstract FileLock lock(long position, long size, boolean shared)
        throws IOException;

public final FileLock lock() throws IOException {
        return lock(0L, Long.MAX_VALUE, false);
    }

public abstract FileLock tryLock(long position, long size, boolean shared)
        throws IOException;

public final FileLock tryLock() throws IOException {
        return tryLock(0L, Long.MAX_VALUE, false);
    }

FileLock是对文件某区域进行标识的(A token representing a lock on a region of a file.),可以通过FileChannel和AsynchronousFileChannel的加锁方法创建,包含四个成员:

public abstract class FileLock implements AutoCloseable {

    private final Channel channel;
    private final long position;
    private final long size;
    private final boolean shared;

加锁区域由size-position决定,不会根据文件大小变化而变化。shared为共享锁和排它锁标识。

对映射文件的部分加锁

文件映射通常应用于极大的文件,对其一部分进行加锁,其他进程可以对其他部分文件进行操作。数据库就是这样,多个用户可以同时访问。下边用2个线程分别对文件不同部分加锁。

import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.channels.FileLock;

/**
 * 对映射文件加锁
 * 映射文件:MappedByteBuffer out = fc.map(MapMode.READ_WRITE, 0, LENGTH);
 * 加锁:FileLock fl = fc.lock(start, end, false);    fl.release();
 * @author bob
 *
 */
public class LockingMappedFIles {

    private static final int LENGTH = 0x0000FFF;//128M
    static FileChannel fc;

    public static void main(String[] args) throws IOException{
        //1.获取读写FileChannel
        fc = new RandomAccessFile("data.dat", "rw").getChannel();
        //2.根据FileChannel获取MappedByteBuffer,读写模式、全文件
        MappedByteBuffer out = fc.map(MapMode.READ_WRITE, 0, LENGTH);
        //3.写入字符  x
        for(int i = 0; i < LENGTH; i++) {
            out.put((byte)'x');
        }
        //4.启动线程1,对文件的前1/3加锁,通过缓冲器操作文件
        Thread thread1 = new LockAndModify(out, 0, LENGTH/3);
        thread1.start();
        //5.启动线程2,对文件的后1/3加锁,通过缓冲器操作文件
        Thread thread2 = new LockAndModify(out, LENGTH*2/3, LENGTH);
        thread2.start();

    }

    static class LockAndModify extends Thread{
        private ByteBuffer byteBuffer;
        private int start, end;

        public LockAndModify(ByteBuffer byteBuffer, int start, int end) {
            //记录加锁位置的起始位置
            this.start = start;
            this.end = end;
            /**
             * 1. 设置MappedByteBuffer的position和limit
             * 2. 调slice()方法,创建新ByteBuffer,映射原ByteBuffer;其position为0,limit为缓冲器容量
             *       由slice()方法创建的ByteBuffer是直接的、只可读的
             *       修改会映射到原ByteBuffer中
             * 3. 另外,limit 和 position不可颠倒顺序,否则position可能比limit大,报错
             */
            byteBuffer.limit(end);
            byteBuffer.position(start);
            this.byteBuffer = byteBuffer.slice();
        }

        public void run() {
            try {
                //加排它锁
                FileLock fl = fc.lock(start, end, false);
                System.out.println("Locked: " + start + " to " + end);
                //修改内容
                while(byteBuffer.position() < byteBuffer.limit()+1) {
                    byteBuffer.put(byteBuffer.position(), (byte)(byteBuffer.get()+1));
                }
                fl.release();
                System.out.println("release: " + start + " to " + end);
            } catch (Exception e) {
                // TODO: handle exception
            }
        }
    }
}

运行结果,文件data.bat中前1/3和后1/3字符变为y。

4. 内存映射文件性能比普通NIO好

(1)内存映射文件和标准IO操作最大的不同之处就在于它虽然最终也是要从磁盘读取数据,但是它并不需要将数据读取到OS内核缓冲区,而是直接将进程的用户私有地址空间中的一部分区域与文件对象建立起映射关系,就好像直接从内存中读、写文件一样,速度当然快.

(2)MappedByteBuffer是一种特殊的直接缓冲器,他们相比基础的 IO操作来说就是少了中间缓冲区的数据拷贝开销。同时他们属于JVM堆外内存,不受JVM堆内存大小的限制。

(3)ByteBuffer.allocateDirect() ,通过DirectMemory的方式来创建直接缓冲区,他在内存上分配空间,与-Xmx和-XX:MaxDirectMemorySize有关,不能超过最大值

具体参考文章:JAVA NIO之浅谈内存映射文件原理与DirectMemory

文章内存映射文件原理探索介绍了数据从磁盘到内存的拷贝过程。

总结

1.内存文件映射主要用于极大文件的访问和操作,可提高性能;

2. 内存映射文件通过通道创建,可设置读写模式和限制映射区域;

3. 对文件某区域加锁可实现多线程或进程对共享资源文件不同区域并发修改;

4. MappedByteBuffer是一种特殊的直接缓冲器,对其修改会反映到文件中。

5. 通过内存映射的方式,性能要比I/O流好,原因是mmap()将文件直接映射到用户空间,减少从磁盘读到内核空间的步骤。

参考

《Java核心编程》

01-30 14:50