本文介绍了Ubuntu Linux 中的异步 IO io_submit 延迟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我正在寻找有关如何为我在 Ubuntu Linux 14.04 上运行的应用程序提供高效和高性能异步 IO 的建议.

I am looking for advice on how to get efficient and high performance asynchronous IO working for my application that runs on Ubuntu Linux 14.04.

我的应用程序处理交易并在磁盘/闪存上创建一个文件.随着应用程序通过事务进行,额外的块被创建,这些块必须附加到磁盘/闪存上的文件中.该应用程序还需要在处理新事务时频繁读取该文件的块.每个事务可能需要从这个文件中读取一个不同的块,此外还要创建一个必须附加到这个文件的新块.有一个传入的事务队列,应用程序可以继续处理队列中的事务,以创建足够深的 IO 操作管道,以隐藏磁盘或闪存上的读取访问或写入完成的延迟.对于尚未写入磁盘/闪存的块(由前一个事务放入写入队列)的读取,应用将停止,直到相应的写入完成.

My app processes transactions and creates a file on disk/flash. As the app is progressing through transactions additional blocks are created that must be appended to the file on disk/flash. The app needs also to frequently read blocks of this file as it is processing new transactions. Each transaction might need to read a different block from this file in addition to also creating a new block that has to be appended to this file. There is an incoming queue of transactions and the app can continue to process transactions from the queue to create a deep enough pipeline of IO ops to hide the latency of read accesses or write completions on disk or flash. For a read of a block (which was put in the write queue by a previous transaction) that has not yet been written to disk/flash, the app will stall until the corresponding write completes.

我有一个重要的性能目标 - 应用程序应该产生尽可能低的延迟来发出 IO 操作.我的应用程序需要大约 10 微秒来处理每个事务,并准备好向磁盘/闪存上的文件发出写入或读取.发出异步读取或写入的额外延迟应尽可能小,以便应用程序可以在仅需要写入文件时以每个事务接近 10 微秒的速率完成处理每个事务.

I have an important performance objective – the app should incur the lowest possible latency to issue the IO operation. My app takes approximately 10 microseconds to process each transaction and be ready to issue a write to or a read from the file on disk/flash. The additional latency to issue an asynchronous read or write should be as small as possible so that the app can complete processing each transaction at a rate as close to 10 usecs per transaction as possible, when only a file write is needed.

我们正在试验一种使用 io_submit 发出写入和读取请求的实现.对于满足我们要求的最佳方法的任何建议或反馈,我将不胜感激.io_submit 是否会给我们提供最佳性能来实现我们的目标?对于每次写入 io_submit 的延迟和每次读取 io_submit 的延迟,我应该期待什么?

We are experimenting with an implementation that uses io_submit to issue write and read requests. I would appreciate any suggestions or feedback on the best approach for our requirement. Is io_submit going to give us the best performance to meet our objective? What should I expect for the latency of each write io_submit and the latency of each read io_submit?

使用我们的实验代码(在 2.3 GHz Haswell Macbook Pro、Ubuntu Linux 14.04 上运行),我们在扩展输出文件时测量了大约 50 微秒的写入 io_submit.这太长了,我们甚至还没有接近我们的性能要求.任何帮助我以最少的延迟启动写入请求的指导将不胜感激.

Using our experimental code (running on a 2.3 GHz Haswell Macbook Pro, Ubuntu Linux 14.04), we are measuring about 50 usecs for a write io_submit when extending the output file. This is too long and we aren't even close to our performance requirements. Any guidance to help me launch a write request with the least latency will be greatly appreciated.


Linux AIO(有时称为 KAIO 或 libaio)是一种魔法,有经验的从业者知道陷阱但出于某种原因禁忌告诉别人他们不知道的问题.通过在网络上摸索和经验,我想出了几个例子,其中 Linux 通过 io_submit()异步 I/O 提交可能变成(悄悄地)同步,从而把它变成一个阻塞(即不再快速)调用:

Linux AIO (sometimes known as KAIO or libaio) is something of a black art where experienced practitioners know the pitfalls but for some reason it's taboo to tell someone about gotchas they don't already know. From scratching around on the web and experience I've come up with a few examples where Linux's asynchronous I/O submission via io_submit() may become (silently) synchronous, thereby turning it into a blocking (i.e. no longer fast) call:

  1. 您正在提交缓冲(也称为非直接)I/O.在以下情况下,您会受到 Linux 缓存的支配,并且您的提交可以同步:
    • 您正在阅读的内容不在读取缓存"中.
    • 写缓存"已满,并且在某些现有写回完成之前无法接受新的写入请求.
  • 如果您提交太大"的 I/O(例如大于 /sys/block/[disk]/queue/max_sectors_kb真正的限制可能小于 512 KiB)它们将在块层中拆分并继续处理多个请求.
  • 系统全局最大并发AIO请求数(见/proc/sys/fs/aio-max-nr 文档) 也会产生影响,但结果将在 io_setup() 中看到,而不是 io_submit().
  • If you submit I/Os that are "too large" (e.g. bigger than /sys/block/[disk]/queue/max_sectors_kb but the true limit may be something smaller like 512 KiB) they will be split up within the block layer and go on to chew up more than one request.
  • The system global maximum number of concurrent AIO requests (see the /proc/sys/fs/aio-max-nr documentation) can also have an impact but the result will be seen in io_setup() rather than io_submit().
  • 它需要获取正在使用的特定锁(例如 i_rwsem).
  • 它需要分配一些额外的内存或页面.


对于 >= 4.14 内核,RWF_NONBLOCK 标志可以用来制作上面嘈杂的一些阻塞场景.例如,当使用缓冲并尝试读取尚未在页面缓存中的数据时,RWF_NONBLOCK 标志将导致提交失败并显示 EAGAIN,否则会发生阻塞.显然,您仍然 a) 需要支持此标志的 4.14(或更高版本)内核,并且 b) 必须了解它未涵盖的情况.我注意到有补丁已被接受或正在提议在更多情况下返回 EAGAIN,否则会阻塞,但在撰写本文时(2019 年)RWF_NONBLOCK 不支持缓冲文件系统写入.

With >= 4.14 kernels the RWF_NONBLOCK flag can be used to make some of the blocking scenarios above noisy. For example, when using buffering and trying to read data not yet in the page cache, the RWF_NONBLOCK flag will cause submission to fail with EAGAIN when blocking would otherwise occur. Obviously you still a) need a 4.14 (or later) kernel that supports this flag and b) have to be aware of the cases it doesn't cover. I notice there are patches that have been accepted or are being proposed to return EAGAIN in more scenarios that would otherwise block but at the time of writing (2019) RWF_NONBLOCK is not supported for buffered filesystem writes.

如果您的内核是 >=5.1,您可以尝试使用 io_uring在不阻止提交方面要好得多(这是一个完全不同的界面,并且是 2020 年的新界面).

If your kernel is >=5.1, you could try using io_uring which does far better at not blocking on submission (it's an entirely different interface and was new in 2020).

07-23 01:13