This is the gist of how cat detects, if it would infinitely exhaust the disk (note that some error checks have also been removed for brevity, the full source code is linked above):struct stat stat_buf;fstat(STDOUT_FILENO, &stat_buf);out_dev = stat_buf.st_dev;out_ino = stat_buf.st_ino;out_isreg = S_ISREG (stat_buf.st_mode) != 0;// ...// for <infile> in inputs { input_desc = open (infile, file_open_mode); // or STDIN_FILENO fstat(input_desc, &stat_buf); /* Don't copy a nonempty regular file to itself, as that would merely exhaust the output device. It's better to catch this error earlier rather than later. */ if (out_isreg && stat_buf.st_dev == out_dev && stat_buf.st_ino == out_ino && lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size) // <--- This is the important line { // ... }// } (end of for)我有两种可能的解释,但两者似乎都有些奇怪.I have two possible explanations, but both seem kind of weird.文件可能为空",根据一些标准(posix),尽管它仍然包含一些信息(以 st_size 进行计数),并且 lseek 或 open 通过偏移一些默认值.我不知道为什么会这样,因为空意味着空,对吗?这种比较实际上是一个聪明"的选择.两个条件的组成.首先,这对我来说很有意义,因为如果 input_desc 为 STDIN_FILENO ,并且没有文件通过管道传输到 stdin ,则 lseek会失败,并显示 ESPIPE (根据手册页),并返回 -1 .然后,整个语句将是 lseek(...)== -1 ||stat_buf.st_size>0 .但这不能成立,因为只有在设备和inode相同的情况下才会执行此检查,并且只有在a)stdin和stdout指向相同的pty时才会发生,但是 out_isreg 将是false 或b)stdin和stdout指向同一个文件,但是 lseek 无法返回 -1 ,对吧?A file could be "empty" according to some standard (posix) although it still contains some information (that is counted with st_size) and lseek or open respects that by offsetting by some default. I wouldn't know why this would be the case, because empty means empty, right?This comparison is really a "clever" composition of two conditions. This made sense to me first, because if input_desc would be STDIN_FILENO and there wouldn't be a file piped to stdin, lseek would fail with ESPIPE (according to the man page) and return -1. Then, this whole statement would be lseek(...) == -1 || stat_buf.st_size > 0. But this cannot be true, because this check only happens if device and inode are the same and that can only happen if a) stdin and stdout are pointing to same pty, but then out_isreg would be false or b) stdin and stdout point to the same file, but then lseek cannot return -1, right?我还整理了一个小程序,可以打印出重要部分的返回值和 errno ,但是对我来说没有什么特别的:I have also put together a small program that prints out the return values and errno for the important parts, but there was nothing standing out to me:#include <errno.h>#include <fcntl.h>#include <stdio.h>#include <stdlib.h>#include <sys/stat.h>#include <unistd.h>int main(int argc, char **argv) { struct stat out_stat; struct stat in_stat; if (fstat(STDOUT_FILENO, &out_stat) < 0) exit(1); printf("this is written to stdout / into the file\n"); int fd; if (argc > 1) fd = open(argv[1], O_RDONLY); else fd = STDIN_FILENO; fstat(fd, &in_stat); int res = lseek(fd, 0, SEEK_CUR); fprintf(stderr, "errno after lseek = %d, EBADF = %d, EINVAL = %d, EOVERFLOW = %d, " "ESPIPE = %d\n", errno, EBADF, EINVAL, EOVERFLOW, ESPIPE); fprintf(stderr, "input:\n\tlseek(...) = %d\n\tst_size = %ld\n", res, in_stat.st_size); printf("outsize is %ld", out_stat.st_size);}$ touch empty$ ./a.out < empty > emptyerrno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29input: lseek(...) = 0 st_size = 0$ echo x > empty$ ./a.out < empty > emptyerrno after lseek = 0, EBADF = 9, EINVAL = 22, EOVERFLOW = 75, ESPIPE = 29input: lseek(...) = 0 st_size = 0因此,我的研究没有涉及我的最终问题:如何通过 cat 源代码帮助 lseek 确定该示例中的文件是否为空?So my ultimate question is untouched from my research: How does lseek help determine if a file is empty in this example from the cat source code?推荐答案这是我的反向工程尝试-我找不到任何公开的讨论来解释为什么将 lseek()放在此处(GNU coreutils中没有其他地方可以做到这一点.)This is my attempt at reverse-engineering this - I could not find any public discussion that explains why lseek() was put there (no other place in GNU coreutils does that).指导性问题是:条件 lseek(input_desc,0,SEEK_CUR)是什么时候<stat_buf.st_size 为假?The guiding question is: When is the condition lseek (input_desc, 0, SEEK_CUR) < stat_buf.st_size false?测试用例:#!/bin/bash# (edited based on comments)set -x# arrange for cat to start off past the end of a non-empty fileecho abcdefghi > /tmp/so/catseek/input# get the shell to open the input file for reading & writing as file descriptor 7exec 7<>/tmp/so/catseek/input# read the whole file via that descriptor (but leave it open)dd <&7# ask linux what the current file position of file descriptor 7 is# should be everything dd read, namely 10 bytes, the size of the filegrep ^pos: /proc/self/fdinfo/7# run cat, with pre and post content so that we know how to locate the interesting part# "-" will cause cat to reuse its file descriptor 0 rather than creating a new file descriptor# the redirections tell the shell to redirect file descriptors 1 and 0 to/from our open file descriptor 7# which, as you'll remember, already has a file position of 10 bytesstrace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post <&7 >&7# now let's see what's in the filecat /tmp/so/catseek/input使用:$ cat /tmp/so/catseek/prepre$ cat /tmp/so/catseek/postpost cat 和 lseek(input_desc,0,SEEK_CUR)<stat_buf.st_size :+ test.sh:8:echo abcdefghi+ test.sh:10:exec+ test.sh:12:ddabcdefghi0+1 records in0+1 records out10 bytes copied, 2.0641e-05 s, 484 kB/s+ test.sh:15:grep '^pos:' /proc/self/fdinfo/7pos: 10+ test.sh:20:strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/postlseek(0, 0, SEEK_CUR) = 14+++ exited with 0 ++++ test.sh:22:cat /tmp/so/catseek/inputabcdefghiprepost cat ,其中 0<stat_buf.st_size :+ test.sh:8:echo abcdefghi+ test.sh:10:exec+ test.sh:12:ddabcdefghi0+1 records in0+1 records out10 bytes copied, 3.6415e-05 s, 275 kB/s+ test.sh:15:grep '^pos:' /proc/self/fdinfo/7pos: 10+ test.sh:20:strace -e lseek ./src/cat /tmp/so/catseek/pre - /tmp/so/catseek/post./src/cat: -: input file is output file+++ exited with 1 ++++ test.sh:22:cat /tmp/so/catseek/inputabcdefghiprepost如您所见,当 cat 开始时,文件位置可能已经在文件末尾之后,并且仅检查文件大小将使 cat 跳过文件,但也会触发失败,因为 if 语句中的代码是:As you can see, when cat starts, the file position may already be after the end-of-file, and checking just the file size will make cat skip the file, but also trigger a failure, as the code inside the if statement is:error (0, 0, _("%s: input file is output file"), infile);ok = false;goto contin;使用 lseek()允许 cat 说哦,文件是相同的,并且不是空的,但是我们的读取仍然会变成空的,因为这就是读取EOF的工作原理,因此我们可以允许这种情况".Using lseek() allows cat to say "Oh, the file is the same, and is not empty, BUT our reads will still turn up empty, because that's how reading past EOF works, so we can allow this case". 这篇关于"lseek"如何帮助确定文件是否为空?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-11 09:46