问题描述
这看起来是一个简单的问题,但我在这里没有找到类似的问题.
This looks like a simple question, but I didn't find anything similar here.
由于C中没有文件复制功能,我们必须自己实现文件复制,但我也不喜欢重新发明轮子,即使是像那种琐碎的事情,所以我想问一下云:
Since there is no file copy function in C, we have to implement file copying ourselves, but I don't like reinventing the wheel even for trivial stuff like that, so I'd like to ask the cloud:
- 您推荐使用 fopen()/fread()/fwrite() 进行文件复制的代码是什么?
- 您建议使用 open()/read()/write() 复制文件的代码是什么?
此代码应该是可移植的(windows/mac/linux/bsd/qnx/younameit)、稳定、经过时间测试、速度快、内存效率高等.欢迎进入特定系统的内部以挤压更多性能(例如获取文件系统簇大小).
This code should be portable (windows/mac/linux/bsd/qnx/younameit), stable, time tested, fast, memory efficient and etc. Getting into specific system's internals to squeeze some more performance is welcomed (like getting filesystem cluster size).
这似乎是一个微不足道的问题,但例如,CP 命令的源代码不是 10 行 C 代码.
This seems like a trivial question but, for example, source code for CP command isn't 10 lines of C code.
推荐答案
就实际 I/O 而言,我以各种形式编写了一百万次用于将数据从一个流复制到另一个流的代码类似于这.它在成功时返回 0,或在错误时设置 errno 时返回 -1(在这种情况下,可能已复制任意数量的字节).
As far as the actual I/O goes, the code I've written a million times in various guises for copying data from one stream to another goes something like this. It returns 0 on success, or -1 with errno set on error (in which case any number of bytes might have been copied).
请注意,对于复制常规文件,您可以跳过 EAGAIN 的内容,因为常规文件总是阻塞 I/O.但不可避免地,如果您编写此代码,就会有人将其用于其他类型的文件描述符,因此可以将其视为免费赠品.
Note that for copying regular files, you can skip the EAGAIN stuff, since regular files are always blocking I/O. But inevitably if you write this code, someone will use it on other types of file descriptors, so consider it a freebie.
GNU cp
做了一个特定于文件的优化,我在这里没有打扰,对于 0 字节的长块,您只需通过寻找结束.
There's a file-specific optimisation that GNU cp
does, which I haven't bothered with here, that for long blocks of 0 bytes instead of writing you just extend the output file by seeking off the end.
void block(int fd, int event) {
pollfd topoll;
topoll.fd = fd;
topoll.events = event;
poll(&topoll, 1, -1);
// no need to check errors - if the stream is bust then the
// next read/write will tell us
}
int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
for(;;) {
void *pos;
// read data to buffer
ssize_t bytestowrite = read(fdin, buf, bufsize);
if (bytestowrite == 0) break; // end of input
if (bytestowrite == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdin, POLLIN);
continue;
}
return -1; // error
}
// write data from buffer
pos = buf;
while (bytestowrite > 0) {
ssize_t bytes_written = write(fdout, pos, bytestowrite);
if (bytes_written == -1) {
if (errno == EINTR) continue; // signal handled
if (errno == EAGAIN) {
block(fdout, POLLOUT);
continue;
}
return -1; // error
}
bytestowrite -= bytes_written;
pos += bytes_written;
}
}
return 0; // success
}
// Default value. I think it will get close to maximum speed on most
// systems, short of using mmap etc. But porters / integrators
// might want to set it smaller, if the system is very memory
// constrained and they don't want this routine to starve
// concurrent ops of memory. And they might want to set it larger
// if I'm completely wrong and larger buffers improve performance.
// It's worth trying several MB at least once, although with huge
// allocations you have to watch for the linux
// "crash on access instead of returning 0" behaviour for failed malloc.
#ifndef FILECOPY_BUFFER_SIZE
#define FILECOPY_BUFFER_SIZE (64*1024)
#endif
int copy_data(int fdin, int fdout) {
// optional exercise for reader: take the file size as a parameter,
// and don't use a buffer any bigger than that. This prevents
// memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
// is small.
for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
void *buffer = malloc(bufsize);
if (buffer != NULL) {
int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
free(buffer);
return result;
}
}
// could use a stack buffer here instead of failing, if desired.
// 128 bytes ought to fit on any stack worth having, but again
// this could be made configurable.
return -1; // errno is ENOMEM
}
打开输入文件:
int fdin = open(infile, O_RDONLY|O_BINARY, 0);
if (fdin == -1) return -1;
打开输出文件很棘手.作为基础,您需要:
Opening the output file is tricksy. As a basis, you want:
int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
if (fdout == -1) {
close(fdin);
return -1;
}
但是有一些混杂因素:
- 当文件相同时,您需要特殊情况,我不记得如何移植.
- 如果输出文件名是一个目录,您可能需要将该文件复制到该目录中.
- 如果输出文件已经存在(使用 O_EXCL 打开以确定这一点并在错误时检查 EEXIST),您可能想要做一些不同的事情,就像
cp -i
所做的那样. - 您可能希望输出文件的权限反映输入文件的权限.
- 您可能希望复制其他特定于平台的元数据.
- 您可能希望也可能不想在出错时取消链接输出文件.
显然,所有这些问题的答案都可能是做与cp
相同的事情".在这种情况下,原始问题的答案是忽略我或其他任何人所说的一切,并使用 cp
的来源".
Obviously the answers to all these questions could be "do the same as cp
". In which case the answer to the original question is "ignore everything I or anyone else has said, and use the source of cp
".
顺便说一句,获取文件系统的集群大小几乎没有用.在您传递磁盘块的大小后很长一段时间内,您几乎总是会看到速度随着缓冲区大小的增加而增加.
Btw, getting the filesystem's cluster size is next to useless. You'll almost always see speed increasing with buffer size long after you've passed the size of a disk block.
这篇关于在 C 中尝试过真正的简单文件复制代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!