问题描述
我正在寻找有关如何为我在 Ubuntu Linux 14.04 上运行的应用程序提供高效和高性能异步 IO 的建议.
I am looking for advice on how to get efficient and high performance asynchronous IO working for my application that runs on Ubuntu Linux 14.04.
我的应用程序处理交易并在磁盘/闪存上创建一个文件.随着应用程序通过事务进行,额外的块被创建,这些块必须附加到磁盘/闪存上的文件中.该应用程序还需要在处理新事务时频繁读取该文件的块.每个事务可能需要从这个文件中读取一个不同的块,此外还要创建一个必须附加到这个文件的新块.有一个传入的事务队列,应用程序可以继续处理队列中的事务,以创建足够深的 IO 操作管道,以隐藏磁盘或闪存上的读取访问或写入完成的延迟.对于尚未写入磁盘/闪存的块(由前一个事务放入写入队列)的读取,应用将停止,直到相应的写入完成.
My app processes transactions and creates a file on disk/flash. As the app is progressing through transactions additional blocks are created that must be appended to the file on disk/flash. The app needs also to frequently read blocks of this file as it is processing new transactions. Each transaction might need to read a different block from this file in addition to also creating a new block that has to be appended to this file. There is an incoming queue of transactions and the app can continue to process transactions from the queue to create a deep enough pipeline of IO ops to hide the latency of read accesses or write completions on disk or flash. For a read of a block (which was put in the write queue by a previous transaction) that has not yet been written to disk/flash, the app will stall until the corresponding write completes.
我有一个重要的性能目标 - 应用程序应该产生尽可能低的延迟来发出 IO 操作.我的应用程序需要大约 10 微秒来处理每个事务,并准备好向磁盘/闪存上的文件发出写入或读取.发出异步读取或写入的额外延迟应尽可能小,以便应用程序可以在仅需要写入文件时以每个事务接近 10 微秒的速率完成处理每个事务.
I have an important performance objective – the app should incur the lowest possible latency to issue the IO operation. My app takes approximately 10 microseconds to process each transaction and be ready to issue a write to or a read from the file on disk/flash. The additional latency to issue an asynchronous read or write should be as small as possible so that the app can complete processing each transaction at a rate as close to 10 usecs per transaction as possible, when only a file write is needed.
我们正在试验一种使用 io_submit 发出写入和读取请求的实现.对于满足我们要求的最佳方法的任何建议或反馈,我将不胜感激.io_submit 是否会给我们提供最佳性能来实现我们的目标?对于每次写入 io_submit 的延迟和每次读取 io_submit 的延迟,我应该期待什么?
We are experimenting with an implementation that uses io_submit to issue write and read requests. I would appreciate any suggestions or feedback on the best approach for our requirement. Is io_submit going to give us the best performance to meet our objective? What should I expect for the latency of each write io_submit and the latency of each read io_submit?
使用我们的实验代码(在 2.3 GHz Haswell Macbook Pro、Ubuntu Linux 14.04 上运行),我们在扩展输出文件时测量了大约 50 微秒的写入 io_submit.这太长了,我们甚至还没有接近我们的性能要求.任何帮助我以最少的延迟启动写入请求的指导将不胜感激.
Using our experimental code (running on a 2.3 GHz Haswell Macbook Pro, Ubuntu Linux 14.04), we are measuring about 50 usecs for a write io_submit when extending the output file. This is too long and we aren't even close to our performance requirements. Any guidance to help me launch a write request with the least latency will be greatly appreciated.
推荐答案
Linux AIO(有时称为 KAIO 或 libaio
)是一种魔法,有经验的从业者知道陷阱但出于某种原因禁忌告诉别人他们不知道的问题.通过在网络上摸索和经验,我想出了几个例子,其中 Linux 通过 io_submit()
的异步 I/O 提交可能变成(悄悄地)同步,从而把它变成一个阻塞(即不再快速)调用:
Linux AIO (sometimes known as KAIO or libaio
) is something of a black art where experienced practitioners know the pitfalls but for some reason it's taboo to tell someone about gotchas they don't already know. From scratching around on the web and experience I've come up with a few examples where Linux's asynchronous I/O submission via io_submit()
may become (silently) synchronous, thereby turning it into a blocking (i.e. no longer fast) call:
- 您正在提交缓冲(也称为非直接)I/O.在以下情况下,您会受到 Linux 缓存的支配,并且您的提交可以同步:
- 您正在阅读的内容不在读取缓存"中.
- 写缓存"已满,并且在某些现有写回完成之前无法接受新的写入请求.
- 如果您提交太大"的 I/O(例如大于
/sys/block/[disk]/queue/max_sectors_kb
但真正的限制可能小于 512 KiB)它们将在块层中拆分并继续处理多个请求. - 系统全局最大并发AIO请求数(见
/proc/sys/fs/aio-max-nr
文档) 也会产生影响,但结果将在io_setup()
中看到,而不是io_submit()
.
- If you submit I/Os that are "too large" (e.g. bigger than
/sys/block/[disk]/queue/max_sectors_kb
but the true limit may be something smaller like 512 KiB) they will be split up within the block layer and go on to chew up more than one request. - The system global maximum number of concurrent AIO requests (see the
/proc/sys/fs/aio-max-nr
documentation) can also have an impact but the result will be seen inio_setup()
rather thanio_submit()
.
- 它需要获取正在使用的特定锁(例如
i_rwsem
). - 它需要分配一些额外的内存或页面.
以上列表并非详尽.
对于 >= 4.14 内核,RWF_NONBLOCK
标志可以用来制作上面嘈杂的一些阻塞场景.例如,当使用缓冲并尝试读取尚未在页面缓存中的数据时,RWF_NONBLOCK
标志将导致提交失败并显示 EAGAIN
,否则会发生阻塞.显然,您仍然 a) 需要支持此标志的 4.14(或更高版本)内核,并且 b) 必须了解它未涵盖的情况.我注意到有补丁已被接受或正在提议在更多情况下返回 EAGAIN
,否则会阻塞,但在撰写本文时(2019 年)RWF_NONBLOCK
不支持缓冲文件系统写入.
With >= 4.14 kernels the RWF_NONBLOCK
flag can be used to make some of the blocking scenarios above noisy. For example, when using buffering and trying to read data not yet in the page cache, the RWF_NONBLOCK
flag will cause submission to fail with EAGAIN
when blocking would otherwise occur. Obviously you still a) need a 4.14 (or later) kernel that supports this flag and b) have to be aware of the cases it doesn't cover. I notice there are patches that have been accepted or are being proposed to return EAGAIN
in more scenarios that would otherwise block but at the time of writing (2019) RWF_NONBLOCK
is not supported for buffered filesystem writes.
如果您的内核是 >=5.1,您可以尝试使用 io_uring
在不阻止提交方面要好得多(这是一个完全不同的界面,并且是 2020 年的新界面).
If your kernel is >=5.1, you could try using io_uring
which does far better at not blocking on submission (it's an entirely different interface and was new in 2020).
- AIOUserGuide 有一个性能注意事项"部分警告一些
io_submit()
阻塞/缓慢情况. - Linux AIO 陷阱列表性能问题"ggaoed AoE 目标的自述文件部分.
- 在 io_submit 期间休眠并等待";XFS 邮件列表线程提示一些 AIO 队列约束.
- io_submit() 阻止写入大量时间"XFS 邮件列表线程有来自 Dave Chiner 的警告 当 XFS 文件系统变得超过85-90% 已满,由于缺少大量连续可用空间,您越接近
ENOSPC
,出现不可预测的文件系统延迟的可能性就会增加. - [PATCH 1/1 linux-next] ext4:向补丁添加兼容性标志检查"LKML 线程有来自 Ext4 首席开发人员 Ted Ts'o 的回复,他谈到了 文件系统如何回退为
O_DIRECT
缓冲 I/O 而不是失败open()
调用.- 在ubifs:允许 O_DIRECT"中LKML 线程 Btrfs 首席开发人员 Chris Mason 表示 Btrfs 在以下情况下使用缓冲 I/O对压缩文件请求
O_DIRECT
. - Linux 0.8.0 上的 ZFS 改变了 ZoL 在
O_DIR 上的错误行为到接受"它通过回退到缓冲 I/O
(参见提交消息中的第 3 点).在 ZFS on Linux Direct IO"GitHub 问题.在NVMe Read Performance Issues with ZFS (submit_bio to io_schedule)"中问题有人建议他们更接近提交启用正确零复制的更改
O_DIRECT
.如果接受这样的更改,它最终会出现在高于 0.8.2 的 ZoL 的某个未来版本中. Ext4 wiki 有一个警告某些 Linux 实现在执行
O_DIRECT
分配写入时回退到缓冲I/O.
The AIOUserGuide has a "Performance considerations" section that warns about some
io_submit()
blocking/slowness situations.A good list of Linux AIO pitfalls is given in the "Performance issues" section of the README for the ggaoed AoE target.
The "sleeps and waits during io_submit" XFS mailing list thread hints at some AIO queue constraints.
The "io_submit() blocks for writes for substantial amount of time" XFS mailing list thread has a warning from Dave Chiner that when an XFS filesystem becomes more than 85-90% full, the chances of unpredictable filesystem delays increases the closer you get to
ENOSPC
due to lack of large amounts of contiguous free space.The "[PATCH 1/1 linux-next] ext4: add compatibility flag check to the patch" LKML thread has a reply from Ext4 lead dev Ted Ts'o talking about how filesystems can fallback to buffered I/O for
O_DIRECT
rather than failing theopen()
call.- In the "ubifs: Allow O_DIRECT" LKML thread Btrfs lead developer Chris Mason states Btrfs resorts to buffered I/O when
O_DIRECT
is requested on compressed files. - ZFS on Linux 0.8.0 changed ZoL's behaviour from erroring on
O_DIRECT
to "accepting" it by falling back to buffered I/O (see point 3 in the commit message). There's further discussion from the lead up to the commit in the ZFS on Linux "Direct IO" GitHub issue. In the "NVMe Read Performance Issues with ZFS (submit_bio to io_schedule)" issue someone suggests they are getting closer to submitting a change that enables a proper zerocopyO_DIRECT
. If such a change were accepted, it would end up in some future version of ZoL greater than 0.8.2. - The Ext4 wiki has a warning that certain Linux implementations fall back to buffered I/O when doing
O_DIRECT
allocating writes.
相关:
- Linux AIO:扩展性不佳
- io_submit() 阻塞直到前一个操作完成
- Linux 上的缓冲异步文件 I/O(但要坚持明确谈论 Linux 内核 AIO 的部分)
- Linux AIO: Poor Scaling
- io_submit() blocks until a previous operation will be completed
- buffered asynchronous file I/O on linux (but stick to the bits explicitly talking about Linux kernel AIO)
希望这篇文章对某人有所帮助(如果对您有帮助,您可以点赞吗?谢谢!).
Hopefully this post helps someone (and if does help you could you upvote it? Thanks!).
这篇关于Ubuntu Linux 中的异步 IO io_submit 延迟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
- In the "ubifs: Allow O_DIRECT" LKML thread Btrfs lead developer Chris Mason states Btrfs resorts to buffered I/O when
- 在ubifs:允许 O_DIRECT"中LKML 线程 Btrfs 首席开发人员 Chris Mason 表示 Btrfs 在以下情况下使用缓冲 I/O对压缩文件请求