问题描述
我是一个独立于操作系统的文件管理器的工作,和我在看的最有效的方式来复制Linux的文件。
Windows有一个内置的功能,,但从我注意到,没有针对Linux没有这样的标准功能。所以我想我会实现我自己。
最显而易见的方法是的fopen / FREAD / FWRITE,但有这样做的更好(更快)的方式?我还必须在一段时间立即停止所有的,这样我可以更新复制到目前为止计数的文件进行菜单的能力。
I am working at an OS independent file manager, and I am looking at the most efficient way to copy a file for Linux.Windows has a built in function, CopyFileEx(), but from what I've noticed, there is no such standard function for Linux. So I guess I will have to implement my own.The obvious way is fopen/fread/fwrite, but is there a better (faster) way of doing it? I must also have the ability to stop every once in a while so that I can update the "copied so far" count for the file progress menu.
推荐答案
不幸的是,你不能使用的sendfile()
,因为目标不是一个套接字在这里。 (名字的sendfile()
来自发送()
+文件)。
Unfortunately, you cannot use sendfile()
here because the destination is not a socket. (The name sendfile()
comes from send()
+ "file").
有关零拷贝,你可以使用拼接()
通过@戴夫的建议。 (除非它不会是零拷贝;这将是从源文件的网页缓存到目标文件的页面缓存一本)
For zero-copy, you can use splice()
as suggested by @Dave. (Except it will not be zero-copy; it will be "one copy" from the source file's page cache to the destination file's page cache.)
不过......(一)拼接()
是Linux的专用;和(b),你几乎可以肯定只是以及使用便携式接口,只要你正确地使用它们。
However... (a) splice()
is Linux-specific; and (b) you can almost certainly do just as well using portable interfaces, provided you use them correctly.
在短,使用的open()
+ 阅读()
+ 的write()
与小的临时缓冲区。我建议8K。所以,你的code会是这个样子:
In short, use open()
+ read()
+ write()
with a small temporary buffer. I suggest 8K. So your code would look something like this:
int in_fd = open("source", O_RDONLY);
assert(in_fd >= 0);
int out_fd = open("dest", O_WRONLY);
assert(out_fd >= 0);
char buf[8192];
while (1) {
ssize_t result = read(in_fd, &buf[0], sizeof(buf));
if (!result) break;
assert(result > 0);
assert(write(out_fd, &buf[0], result) == result);
}
通过这个循环中,你将会从in_fd页面缓存到CPU的L1高速缓存复制8K,然后从L1高速缓存写入out_fd页面缓存。那么你将覆盖L1高速缓存的一部分与来自该文件的下一个8K数据块,依此类推。最终的结果是,在 BUF
将永远不会真正(末尾也许除了一次)被存储在主内存中的所有数据;但从系统RAM的角度来看,这就像使用零拷贝拼接好()
。再加上它是完全移植到任何POSIX系统。
With this loop, you will be copying 8K from the in_fd page cache into the CPU L1 cache, then writing it from the L1 cache into the out_fd page cache. Then you will overwrite that part of the L1 cache with the next 8K chunk from the file, and so on. The net result is that the data in buf
will never actually be stored in main memory at all (except maybe once at the end); from the system RAM's point of view, this is just as good as using "zero-copy" splice()
. Plus it is perfectly portable to any POSIX system.
请注意,小缓冲区在这里是关键。典型的现代CPU具有32K左右的L1数据缓存,因此如果缓冲区太大,这种方法会慢一些。可能很多,要慢得多。因此,保持缓冲在几千字节的范围。
Note that the small buffer is key here. Typical modern CPUs have 32K or so for the L1 data cache, so if you make the buffer too big, this approach will be slower. Possibly much, much slower. So keep the buffer in the "few kilobytes" range.
当然,除非你的磁盘子系统是非常非常快的,内存带宽可能不是你的限制因素。因此,我建议<$c$c>posix_fadvise$c$c>让内核知道你在忙什么:
Of course, unless your disk subsystem is very very fast, memory bandwidth is probably not your limiting factor. So I would recommend posix_fadvise
to let the kernel know what you are up to:
posix_fadvise(in_fd, 0, 0, POSIX_FADV_SEQUENTIAL);
这会给出一个提示到Linux内核,它的预读机制应该是非常积极的。
This will give a hint to the Linux kernel that its read-ahead machinery should be very aggressive.
我也建议使用<$c$c>posix_fallocate$c$c>以preallocate为目标文件的存储。这会告诉你的时间提前,你是否会耗尽磁盘。而对于现代性的核心与现代文件系统(如XFS),这将有助于减少在目标文件碎片。
I would also suggest using posix_fallocate
to preallocate the storage for the destination file. This will tell you ahead of time whether you will run out of disk. And for a modern kernel with a modern file system (like XFS), it will help to reduce fragmentation in the destination file.
我建议的最后一件事是 MMAP
。它通常是全归功于TLB抖动的最慢的方法。 (非常最近与透明大页面内核可能会减轻这种;我最近没有试过,但它肯定会导致非常糟糕,所以我只会打扰测试 MMAP
如果您有很多时间进行基准测试和一个非常最近的内核。)
The last thing I would recommend is mmap
. It is usually the slowest approach of all thanks to TLB thrashing. (Very recent kernels with "transparent hugepages" might mitigate this; I have not tried recently. But it certainly used to be very bad. So I would only bother testing mmap
if you have lots of time to benchmark and a very recent kernel.)
[更新]
有在评论是否一些问题从一个文件到另一个拼接
是零拷贝。 Linux内核开发人员称之为页面窃取。无论是手册页拼接
和说, SPLICE_F_MOVE
标记应提供此功能。
There is some question in the comments about whether splice
from one file to another is zero-copy. The Linux kernel developers call this "page stealing". Both the man page for splice
and the comments in the kernel source say that the SPLICE_F_MOVE
flag should provide this functionality.
不幸的是, SPLICE_F_MOVE
的支持是的永不更换。 (内核源的意见从来没有得到更新。)如果你搜索内核源代码,你会发现 SPLICE_F_MOVE
实际上没有任何地方提及。该(2008年)指出它正在等待更换。
Unfortunately, the support for SPLICE_F_MOVE
was yanked in 2.6.21 (back in 2007) and never replaced. (The comments in the kernel sources never got updated.) If you search the kernel sources, you will find SPLICE_F_MOVE
is not actually referenced anywhere. The last message I can find (from 2008) says it is "waiting for a replacement".
的底线是,拼接
从一个文件到另一台电话的memcpy
来移动数据;它的不的零拷贝。这不是明显优于你可以在用户空间用做读
/ 写
小缓冲区,所以你还不如粘到标准,便携式接口
The bottom line is that splice
from one file to another calls memcpy
to move the data; it is not zero-copy. This is not much better than you can do in userspace using read
/write
with small buffers, so you might as well stick to the standard, portable interfaces.
如果页面窃取是不断重新添加到Linux内核,那么的好处拼接
会大得多。 (即使在今天,当目的是插座,你会得到真正的零拷贝,使得拼接
更具吸引力。)但是,对于这个问题的目的,拼接
不买你了。
If "page stealing" is ever added back into the Linux kernel, then the benefits of splice
would be much greater. (And even today, when the destination is a socket, you get true zero-copy, making splice
more attractive.) But for the purpose of this question, splice
does not buy you very much.
这篇关于最有效的方式来复制在Linux中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!