问题描述
我要寻找最有效的方式做我异步文件/于Linux操作系统。
I am looking for the most efficient way to do asynchronous file I/O on linux.
POSIX的glibc的实现在用户级使用线程。
The POSIX glibc implementation uses threads in userland.
本机AIO内核API只与无缓冲操作,补丁适用于内核中添加了缓冲存在操作的支持,但这些都是3岁,似乎没有人关心他们融入主线。
The native aio kernel api only works with unbuffered operations, patches for the kernel to add support for buffered operations exist, but those are >3 years old and no one seems to care about integrating them into the mainline.
我发现了很多其他的想法,概念,补丁程序,将允许异步I / O,但其中大多数是在同时也是> 3岁的文章。在今天的内核真正可用的这一切呢?我读过关于servlet,acalls,东西与内核线程多的东西,我甚至不记得现在。
I found plenty of other ideas, concepts, patches that would allow asynchronous I/O, though most of them in articles that are also >3 years old. What of all this is really available in todays kernel? I've read about servlets, acalls, stuff with kernel threads and more things I don't even remember right now.
什么是做缓冲异步文件输入/输出在今天的内核最有效的方式?
What is the most efficient way to do buffered asynchronous file input/output in todays kernel?
推荐答案
除非你想编写自己的IO线程池,glibc的实现是一个可以接受的解决方案。它的实际工作出奇地好于运行完全在用户态的东西。
Unless you want to write your own IO thread pool, the glibc implementation is an acceptable solution. It actually works surprisingly well for something that runs entirely in userland.
内核实现不带缓冲IO都在我的工作经历(虽然我见过别人说的相反!)。这是很好的,如果你想读通过DMA的海量数据,当然,它吮吸大的时候,如果你打算采取缓冲区高速缓存的优势。结果
还要注意的是内核AIO呼叫实际上可能阻塞。有一个规模有限的命令缓冲区,大型读取被分解成几个较小的。一旦队列已满,异步命令同步运行。惊喜。我碰到的这个问题,一两年以前,找不到一个解释。问围绕给我的是的,当然,这是它是如何工作的答案。结果
从我的理解,支持缓冲AIO官方的兴趣,是不是非常大,尽管两种几个工作方案似乎是推出多年。一些我读过有关人的行参数你不想反正用的缓冲区和没有人需要一个和大多数人甚至不使用epoll的呢。所以,嗯...咩。
The kernel implementation does not work with buffered IO at all in my experience (though I've seen other people say the opposite!). Which is fine if you want to read huge amounts of data via DMA, but of course it sucks big time if you plan to take advantage of the buffer cache.
Also note that the kernel AIO calls may actually block. There is a limited size command buffer, and large reads are broken up into several smaller ones. Once the queue is full, asynchronous commands run synchronously. Surprise. I've run into this problem a year or two ago and could not find an explanation. Asking around gave me the "yeah of course, that's how it works" answer.
From what I've understood, the "official" interest in supporting buffered aio is not terribly great either, despite several working solutions seem to be available for years. Some of the arguments that I've read were on the lines of "you don't want to use the buffers anyway" and "nobody needs that" and "most people don't even use epoll yet". So, well... meh.
如果能够得到一个的epoll
按已完成的异步操作信号是另一个问题,直到最近,但通过同时这工作真精 eventfd
。
Being able to get an epoll
signalled by a completed async operation was another issue until recently, but in the meantime this works really fine via eventfd
.
注意glibc的实施实际上将产卵的内幕 __ aio_enqueue_request
需求线程。这可能是没什么大不了的,因为产卵线程不是的是的贵得要命了,而是应该意识到这一点。如果你启动一个异步操作的理解是立即返回,那么这个假设可能不是真的,因为它可能会首先产卵某些线程。
Note that the glibc implementation will actually spawn threads on demand inside __aio_enqueue_request
. It is probably no big deal, since spawning threads is not that terribly expensive any more, but one should be aware of it. If your understanding of starting an asynchronous operation is "returns immediately", then that assumption may not be true, because it may be spawning some threads first.
修改:结果
一点题外话,在Windows下存在一个非常类似的情况,一个在glibc的AIO实现的地方,排队一个异步操作的立即返回的假设是不正确的。结果
如果你想读的所有数据是在缓冲区缓存,Windows将决定它会直接运行请求的同步的,因为它会立即结束反正。这充分证明,诚然声音很大,太。但万一有几兆复制或在其他情况下,线程页面错误或不并发IO(从而为锁竞争)立即可以是一个令人惊讶的很长一段时间 - 我见过2立竿见影时代-5毫秒。这是在大多数情况下没有问题,但例如一个16.66ms帧时间的约束下,你可能不希望冒险阻断在随机时间为5ms。因此,天真的假设可以异步IO从做我的渲染线程没有问题,因为异步不阻止是有缺陷的。
EDIT:
As a sidenote, under Windows there exists a very similar situation to the one in the glibc AIO implementation where the "returns immediately" assumption of queuing an asynchronous operation is not true.
If all data that you wanted to read is in the buffer cache, Windows will decide that it will instead run the request synchronously, because it will finish immediately anyway. This is well-documented, and admittedly sounds great, too. Except in case there are a few megabytes to copy or in case another thread has page faults or does IO concurrently (thus competing for the lock) "immediately" can be a surprisingly long time -- I've seen "immediate" times of 2-5 milliseconds. Which is no problem in most situations, but for example under the constraint of a 16.66ms frame time, you probably don't want to risk blocking for 5ms at random times. Thus, the naive assumption of "can do async IO from my render thread no problem, because async doesn't block" is flawed.
这篇关于在Linux上缓冲异步文件I / O的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!