Linux AIO:可伸缩性差

本文介绍了Linux AIO:可伸缩性差的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个使用Linux异步I/O系统调用的库，并且想知道为什么io_submit函数在ext4文件系统上扩展性差.如果可能的话，对于大的IO请求大小，我该怎么做才能使io_submit不阻塞?我已经执行了以下操作(如此处所述):

I am writing a library that uses the Linux asynchronous I/O system calls, and would like to know why the io_submit function is exhibiting poor scaling on the ext4 file system. If possible, what can I do to get io_submit not to block for large IO request sizes? I already do the following (as described here):

使用O_DIRECT.
将IO缓冲区对准512字节边界.
将缓冲区大小设置为页面大小的倍数.

为了观察内核在io_submit中花费了多长时间，我运行了一个测试，在其中使用dd和/dev/urandom创建了一个1 Gb测试文件，并反复删除了系统缓存(sync; echo 1 > /proc/sys/vm/drop_caches)并读取文件中越来越大的部分.在每次迭代中，我都打印了io_submit所花费的时间以及等待读取请求完成所花费的时间.我在运行Arch Linux(内核版本3.11)的x86-64系统上进行了以下实验.该机器具有一个SSD和一个Core i7 CPU.第一张图绘制了读取的页面数与等待io_submit完成所花费的时间.第二张图显示等待读取请求完成所花费的时间.时间以秒为单位.

In order to observe how long the kernel spends in io_submit, I ran a test in which I created a 1 Gb test file using dd and /dev/urandom, and repeatedly dropped the system cache (sync; echo 1 > /proc/sys/vm/drop_caches) and read increasingly larger portions of the file. At each iteration, I printed the time taken by io_submit and the time spent waiting for the read request to finish. I ran the following experiment on an x86-64 system running Arch Linux, with kernel version 3.11. The machine has an SSD and a Core i7 CPU. The first graph plots the number of pages read against the time spent waiting for io_submit to finish. The second graph displays the time spent waiting for the read request to finish. The times are measured in seconds.

为了进行比较，我创建了一个类似的测试，该测试通过pread使用同步IO.结果如下:

For comparison, I created a similar test that uses synchronous IO by means of pread. Here are the results:

似乎异步IO可以按预期工作，直到大约20,000页的请求大小.之后，io_submit会阻塞.这些观察结果导致以下问题:

It seems that the asynchronous IO works as expected up to request sizes of around 20,000 pages. After that, io_submit blocks. These observations lead to the following questions:

为什么io_submit的执行时间不是恒定的?
是什么原因导致这种不良的缩放行为?
我是否需要将ext4文件系统上的所有读取请求拆分为多个请求，每个请求的大小均小于20,000页?
这个20,000的魔术"价值从何而来?如果我在另一个Linux系统上运行程序，如何确定要使用的最大IO请求大小，而不会出现不良的扩展行为?

Why isn't the execution time of io_submit constant?
What is causing this poor scaling behavior?
Do I need to split up all read requests on ext4 file systems into multiple requests, each of size less than 20,000 pages?
Where does this "magic" value of 20,000 come from? If I run my program on another Linux system, how can I determine the largest IO request size to use without experiencing poor scaling behavior?

下面是用于测试异步IO的代码.如果您认为其他相关的资源清单，我可以添加它们，但我尝试仅发布我认为可能相关的详细信息.

The code used to test the asynchronous IO follows below. I can add other source listings if you think they are relevant, but I tried to post only the details that I thought might be relevant.

#include <cstddef>
#include <cstdint>
#include <cstring>
#include <chrono>
#include <iostream>
#include <memory>
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
// For `__NR_*` system call definitions.
#include <sys/syscall.h>
#include <linux/aio_abi.h>

static int
io_setup(unsigned n, aio_context_t* c)
{
    return syscall(__NR_io_setup, n, c);
}

static int
io_destroy(aio_context_t c)
{
    return syscall(__NR_io_destroy, c);
}

static int
io_submit(aio_context_t c, long n, iocb** b)
{
    return syscall(__NR_io_submit, c, n, b);
}

static int
io_getevents(aio_context_t c, long min, long max, io_event* e, timespec* t)
{
    return syscall(__NR_io_getevents, c, min, max, e, t);
}

int main(int argc, char** argv)
{
    using namespace std::chrono;
    const auto n = 4096 * size_t(std::atoi(argv[1]));

    // Initialize the file descriptor. If O_DIRECT is not used, the kernel
    // will block on `io_submit` until the job finishes, because non-direct
    // IO via the `aio` interface is not implemented (to my knowledge).
    auto fd = ::open("dat/test.dat", O_RDONLY | O_DIRECT | O_NOATIME);
    if (fd < 0) {
        ::perror("Error opening file");
        return EXIT_FAILURE;
    }

    char* p;
    auto r = ::posix_memalign((void**)&p, 512, n);
    if (r != 0) {
        std::cerr << "posix_memalign failed." << std::endl;
        return EXIT_FAILURE;
    }
    auto del = [](char* p) { std::free(p); };
    std::unique_ptr<char[], decltype(del)> buf{p, del};

    // Initialize the IO context.
    aio_context_t c{0};
    r = io_setup(4, &c);
    if (r < 0) {
        ::perror("Error invoking io_setup");
        return EXIT_FAILURE;
    }

    // Setup I/O control block.
    iocb b;
    std::memset(&b, 0, sizeof(b));
    b.aio_fildes = fd;
    b.aio_lio_opcode = IOCB_CMD_PREAD;

    // Command-specific options for `pread`.
    b.aio_buf = (uint64_t)buf.get();
    b.aio_offset = 0;
    b.aio_nbytes = n;
    iocb* bs[1] = {&b};

    auto t1 = high_resolution_clock::now();
    auto r = io_submit(c, 1, bs);
    if (r != 1) {
        if (r == -1) {
            ::perror("Error invoking io_submit");
        }
        else {
            std::cerr << "Could not submit request." << std::endl;
        }
        return EXIT_FAILURE;
    }
    auto t2 = high_resolution_clock::now();
    auto count = duration_cast<duration<double>>(t2 - t1).count();
    // Print the wait time.
    std::cout << count << " ";

    io_event e[1];
    t1 = high_resolution_clock::now();
    r = io_getevents(c, 1, 1, e, NULL);
    t2 = high_resolution_clock::now();
    count = duration_cast<duration<double>>(t2 - t1).count();
    // Print the read time.
    std::cout << count << std::endl;

    r = io_destroy(c);
    if (r < 0) {
        ::perror("Error invoking io_destroy");
        return EXIT_FAILURE;
    }
}

spent

Linux AIO:可伸缩性差

问题描述

推荐答案