本文介绍了在MS / Unix / Mac下读一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近有人询问过阅读线。我有一段时间写的这段代码

(基于H. Shildts的BASIC式解释器的一部分
C的Art中的
)读取一行结束的文件任何格式:

Microsoft风格的CR / LF对,Unix风格的NL或Mac风格的CR。这也是
允许不遵循空行的EOF。我以为这会使文本文件共享更容易。


这是:

/ *加载文件,将换行标准化为* nix标准(仅NL)。 * /

int load_file(FILE * fp,char * buf,int max_size)

{

int i = 0;

char c;


do {

c = getc(fp); / *将文件读入内存* /

i ++; / *跟踪文件大小* /

if(c ==''\ r''){/ *读取CR * /

c = getc( FP); / *读取另一个字符* /

if(c!=''\ n''){/ * whoops,而不是NL(Mac风格)* /

* buf ++ =''\ n''; / *更正,存储NL * /

i ++; / *和更新大小* /

} / *否则,c现在持有CR / NL对的NL * /

} / * c现在持有要放置的字符; NL,(CR /)LF,或(新)char

* /

* buf ++ = c;

} while(!feof( fp)&& i< max_size);

/ * Null终止文件,最后检查NL(LF)。 * /

如果(buf [-1]!=''\ n'')/ *如果文件没有以新行结尾* /

* buf ++ =''\ n'',i ++; / *将其粘贴在* /

* buf =''\'''; / *把空文件过去* /

fclose(fp);

返回i; / *加载文件的大小* /

}


这允许文件使用不同EOL的混合。这是一个糟糕的想法吗?


- Marty(我仍然认为自己是新手)

解决方案



我相信C标准库需要显示所有文本流

由零行或多行组成,每行终止by

a换行符。该文件的实际行尾标记是

抽象的。



这是你的第一个问题。 getc通过

返回文件结尾或错误,返回EOF,一个int值。所以你应该总是将getc的返回

值分配给一个int,并且只有在确定

确实它是一个有效字符后才能将它转换为char。



你喜欢混淆不是你。我将上面的两个操作写成

单独的语句以避免错误。



标准库负责处理文本文件。操作二进制文件时只需要担心





[剪辑代码]



标准库负责处理文本文件。你在操作二进制文件时只需要担心



不正确。在许多(大多数?所有?)Unix系统上,当使用r而不是rb打开DOS EOL

文件(0x0D 0x0A行结尾)时,0x0D
$读取时,流中不会删除b $ b个字符*。在DOS

系统上,它们被删除,因为DOS识别2字符

序列是有意义的。 Unix系统不认识序列为

有意义,所以他们留下了0x0D。

Dave


-

David Tiktin

tiktin [at] advancedrelay [dot] com




我相信C标准库需要显示所有文本流

由零行或多行组成,每行终止by

a换行符。该文件的实际行尾标记是抽象的



仅当实现认为是文本流时才这样。

在Unix机器上打开旧式Mac文本文件Unix机器

将看不到任何新行。


< snip>



标准库负责处理文本文件。操作二进制文件时,您只需要担心



有时你应该把它留给实现,但有时候你需要b / b
来处理文本文件。来自一个尚未翻译的外国系统,然后你必须自己处理它。

-

Flash Gordon

Someone recently asked about reading lines. I had this code written
some time ago (part of a BASIC-style interpreter based on H. Shildts
in Art of C) to read a file with the lines ended in any format:
Microsoft-style CR/LF pair, Unix-style NL, or Mac-style CR. It also
allows for EOF that does not follow a blank line. I thought this would
make text-file sharing a bit easier.

Here it is:
/* Load a file, normalizing newlines to *nix standard (just NL). */
int load_file(FILE *fp, char *buf, int max_size)
{
int i = 0;
char c;

do {
c = getc(fp);/* read the file into memory */
i++;/* keep track of size of file*/
if (c == ''\r'') {/* read a CR */
c = getc(fp);/* read another character */
if (c != ''\n'') {/* whoops, not an NL (Mac style) */
*buf++ = ''\n'';/* correct, store NL */
i++;/* and update size */
}/* otherwise, c now holds the NL from the CR/NL pair */
}/* c now holds character to put; NL, (CR/)LF, or (new) char
*/
*buf++ = c;
} while ( !feof(fp) && i < max_size );
/* Null terminate the file, check for NL (LF) at end. */
if (buf[-1] != ''\n'')/* if file didn''t end in new line */
*buf++ = ''\n'', i++;/* tack it on */
*buf = ''\0'';/* put null past file */
fclose(fp);
return i;/* size of file loaded */
}

This allows the file to use a mix of different EOLs. Is that a bad
idea?

-- Marty (I still consider myself a newbie)

解决方案

I believe the C Standard library is required to present all text streams
as being composed of zero or more lines, each line being terminated by
a newline character. The actual end-of-line marker of the file is
abstracted away.

Here is your first problem. getc signals end-of-file or error by
returning EOF, an int value. So you should always assign the return
value of getc to an int and convert it to a char only after making sure
that it is indeed a valid character.

You like obfuscation don''t you. I''d write the two operations above as
separate statements to avoid error.

It''s taken care of for text files by the Standard library. You only need
to worry when operating on binary files.


[snip code]


It''s taken care of for text files by the Standard library. You
only need to worry when operating on binary files.

Not true. On many (most? all?) Unix systems, when opening a DOS EOL
file (0x0D 0x0A line endings) using "r", not "rb", the 0x0D
characters are *not* removed from the stream when reading. On DOS
systems, they are removed since DOS recognizes the 2 character
sequence as meaningful. Unix systems don''t recognize the sequence as
meaningful so they leave the 0x0Ds.

Dave

--
D.a.v.i.d T.i.k.t.i.n
t.i.k.t.i.n [at] a.d.v.a.n.c.e.d.r.e.l.a.y [dot] c.o.m



I believe the C Standard library is required to present all text streams
as being composed of zero or more lines, each line being terminated by
a newline character. The actual end-of-line marker of the file is
abstracted away.

Only if it is what the implementation considers to be a text stream.
Open an old style Mac text file on a Unix machine and the Unix machine
will not see any new lines.

<snip>


It''s taken care of for text files by the Standard library. You only need
to worry when operating on binary files.

Sometimes you should leave it to the implementation, but sometimes you
have to cope with "text files" from a foreign system that have not been
translated, and then you have to deal with it yourself.
--
Flash Gordon


这篇关于在MS / Unix / Mac下读一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:42
查看更多