sscanf() 似乎很适合去除匹配数据,例如:

sscanf ("abc,f,123,234", "%[a-z],%c,%d,%d", str, &chr, &i1, &i2)

但是我需要断言它没有遇到空格:
sscanf ("abc,  f  , 123    , 234  ", "%[a-z],%c,%d,%d", str, &chr, &i1, &i2)
/* How to tell it to fail on whitespace?? */

另外我需要断言没有尾随数据:
sscanf ("abc,f,123,234__SOMERUBBISH", "%[a-z],%c,%d,%d", str, &chr, &i1, &i2)
/* How to detect trailing rubbish or make sscanf fail */

如何让 sscanf 更严格地解析字符串?

这是一个编译为 ANSI C 的大学作业,我没有包含正则表达式的选项。

最佳答案

简而言之,如果不能允许空格,则不能使用直接文件 I/O 函数,例如 scanf() 等。每个 %d 转换都允许值前有任意数量的空格,包括换行符。您必须改用基于字符串的函数,例如 sscanf()

您最好使用 fgets() 或 POSIX
getline() 读取数据行,然后使用 %n 确定转换完成的位置。

如果您没有消除 fgets()getline() 保存的换行符,那么您可以测试输入中最后一个匹配(或第一个未匹配字符)之后的第一个字符是否为换行符;否则,您可以测试空字节作为第一个不匹配的字符。

您仍然需要检查两个数字之前是否没有空格;您对每个都再次使用 %n 。请注意,%n 转换规范不计入 scanf() 等人返回的数字中。
ws.c

#include <stdio.h>

int main(void)
{
    char   str[10] = "QQQQQQQQQ";
    char   chr = 'Z';
    int    i1 = 77;
    int    i2 = 88;
    int    n1;
    int    n2;
    int    n3;
    char  *line = 0;
    size_t linelen = 0;
    int    length;

    while ((length = getline(&line, &linelen, stdin)) != -1)
    {
        printf("Line: <<%.*s>>\n", length - 1, line);

        int rc = sscanf(line, "%[a-z],%c,%n%d,%n%d%n",
                        str, &chr, &n1, &i1, &n2, &i2, &n3);

        const char *tag = "success";
        if (rc <= 0)
            tag = "total failure";
        else if (rc < 4)
            tag = "partial failure";
        else if (rc > 4)
            tag = "WTF?";
        printf("rc = %d: %s\n", rc, tag);
        printf("n1 = %d [%c], n2 = %d [%c], n3 = %d [%c]\n",
               n1, line[n1], n2, line[n2], n3, line[n3]);
        printf("<<%s>>,<<%c>>,%d,%d\n", str, chr, i1, i2);
    }
    return 0;
}

这将使您能够确定存在问题的位置。
data
使用 ☐ 标记行尾,考虑数据文件 (data):
abc,f,123,234☐
abc,  f  , 123    , 234  ☐
abc,f,123,234__SOMERUBBISH☐
xyz,f, 123, 234☐
xyz,f,123 ,234 ☐

示例运行

上面程序的输出是:
$ ./ws < data
Line: <<abc,f,123,234>>
rc = 4: success
n1 = 6 [1], n2 = 10 [2], n3 = 13 [
]
<<abc>>,<<f>>,123,234
Line: <<abc,  f  , 123    , 234  >>
rc = 2: partial failure
n1 = 6 [f], n2 = 10 [ ], n3 = 13 [3]
<<abc>>,<< >>,123,234
Line: <<abc,f,123,234__SOMERUBBISH>>
rc = 4: success
n1 = 6 [1], n2 = 10 [2], n3 = 13 [_]
<<abc>>,<<f>>,123,234
Line: <<xyz,f, 123, 234>>
rc = 4: success
n1 = 6 [ ], n2 = 11 [ ], n3 = 15 [
]
<<xyz>>,<<f>>,123,234
Line: <<xyz,f,123 ,234 >>
rc = 3: partial failure
n1 = 6 [1], n2 = 11 [2], n3 = 15 [
]
<<xyz>>,<<f>>,123,234
$

显然,对于标记为“部分失败”的行,您不能依赖上次成功转换之后的数据。但是在转换成功的地方,您可以看到可以通过检查 line[n1] 等来发现问题。
ws2.c
代码的这个微小变化对问题进行了稍微扩展的分析。请注意,此分析不适用于部分或完全不成功的扫描。最好是在sscanf() 的返回值不是4 时简单地报告问题,只在扫描成功时分析这些值。 (这样做的修改并不复杂。)它还可以防止长字符串作为第一个字段的缓冲区溢出。
#include <ctype.h>
#include <stdio.h>

#undef isdecint
static inline int isdecint(int c)
{
    return (isdigit(c) || c == '+' || c == '-');
}

int main(void)
{
    char   str[10] = "QQQQQQQQQ";
    char   chr = 'Z';
    int    i1 = 77;
    int    i2 = 88;
    int    n1;
    int    n2;
    int    n3;
    char  *line = 0;
    size_t linelen = 0;
    int    length;

    while ((length = getline(&line, &linelen, stdin)) != -1)
    {
        printf("Line: <<%.*s>>\n", length - 1, line);

        int rc = sscanf(line, "%9[a-z],%c,%n%d,%n%d%n",
                        str, &chr, &n1, &i1, &n2, &i2, &n3);

        const char *tag = "success";
        if (rc <= 0)
            tag = "total failure";
        else if (rc < 4)
            tag = "partial failure";
        else if (rc > 4)
            tag = "WTF?";
        printf("rc = %d: %s\n", rc, tag);
        printf("n1 = %d [%c], n2 = %d [%c], n3 = %d [%c]\n",
               n1, line[n1], n2, line[n2], n3, line[n3]);
        if (!isdecint(line[n1]))
            printf("Invalid char for n1\n");
        if (!isdecint(line[n2]))
            printf("Invalid char for n2\n");
        if (line[n3] != '\n')
            printf("Invalid char for n3\n");
        printf("<<%s>>,<<%c>>,%d,%d\n", str, chr, i1, i2);
    }
    return 0;
}
data2
abc,f,123,234☐
abc,  f  , 345    , 456  ☐
abc,f,567,678__SOMERUBBISH☐
xyz,f, 1234, 2345☐
xyz,f,-3456 ,-4567 ☐
xyz,f,+5678,+6789☐
xyz,f,+ 5678,- 6789 X☐

sample 运行
$ ./ws2 < data2
Line: <<abc,f,123,234>>
rc = 4: success
n1 = 6 [1], n2 = 10 [2], n3 = 13 [
]
<<abc>>,<<f>>,123,234
Line: <<abc,  f  , 345    , 456  >>
rc = 2: partial failure
n1 = 6 [f], n2 = 10 [ ], n3 = 13 [5]
Invalid char for n1
Invalid char for n2
Invalid char for n3
<<abc>>,<< >>,123,234
Line: <<abc,f,567,678__SOMERUBBISH>>
rc = 4: success
n1 = 6 [5], n2 = 10 [6], n3 = 13 [_]
Invalid char for n3
<<abc>>,<<f>>,567,678
Line: <<xyz,f, 1234, 2345>>
rc = 4: success
n1 = 6 [ ], n2 = 12 [ ], n3 = 17 [
]
Invalid char for n1
Invalid char for n2
<<xyz>>,<<f>>,1234,2345
Line: <<xyz,f,-3456 ,-4567 >>
rc = 3: partial failure
n1 = 6 [-], n2 = 12 [,], n3 = 17 [7]
Invalid char for n2
Invalid char for n3
<<xyz>>,<<f>>,-3456,2345
Line: <<xyz,f,+5678,+6789>>
rc = 4: success
n1 = 6 [+], n2 = 12 [+], n3 = 17 [
]
<<xyz>>,<<f>>,5678,6789
Line: <<xyz,f,+ 5678,- 6789 X>>
rc = 2: partial failure
n1 = 6 [+], n2 = 12 [,], n3 = 17 [8]
Invalid char for n2
Invalid char for n3
<<xyz>>,<<f>>,5678,6789

关于c - 如何使用 scanf/sscanf 确认没有空格或尾随数据?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/30294032/

10-16 21:45