问题描述
我一直在思考如果将较长的字符串文字分配给较小尺寸的char数组会发生什么情况. (我知道,如果我使用字符串文字作为初始化程序,则可能会忽略大小,让编译器计算字符数,或者使用strlen()+ 1作为大小.)
I've been thinking of what will happen if I assign a longer string literal to a char array of smaller size. (I understand that if I use a string literal as an initializer, I would probably leave out the size and let the compiler count the number of chars, or use strlen()+1 as the size. )
我有以下代码:
#include <stdio.h>
int main() {
char a[3] = "abc"; // a[2] gives an error of initializer-string for array of chars is too long
printf("%s\n", a);
printf("%p\n", a);
}
我希望它会崩溃,但实际上它会在没有警告的情况下进行编译,并且可以将其打印出来.但是,使用valgrind时,出现以下错误消息.
I expect it to crash but it actually compiles without warning and can print things out. But using valgrind, I get the following error messages.
==19195== Memcheck, a memory error detector
==19195== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19195== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==19195== Command: ./a.out
==19195==
==19195== Conditional jump or move depends on uninitialised value(s)
==19195== at 0x4E88CC0: vfprintf (vfprintf.c:1632)
==19195== by 0x4E8F898: printf (printf.c:33)
==19195== by 0x4005CC: main (main.c:5)
==19195==
==19195== Conditional jump or move depends on uninitialised value(s)
==19195== at 0x4EB475D: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:850)
==19195== by 0x4EB56AF: _IO_default_xsputn (genops.c:455)
==19195== by 0x4EB32C6: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1352)
==19195== by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195== by 0x4E8F898: printf (printf.c:33)
==19195== by 0x4005CC: main (main.c:5)
==19195==
==19195== Conditional jump or move depends on uninitialised value(s)
==19195== at 0x4EB478A: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:858)
==19195== by 0x4EB56AF: _IO_default_xsputn (genops.c:455)
==19195== by 0x4EB32C6: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1352)
==19195== by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195== by 0x4E8F898: printf (printf.c:33)
==19195== by 0x4005CC: main (main.c:5)
==19195==
==19195== Conditional jump or move depends on uninitialised value(s)
==19195== at 0x4EB56B3: _IO_default_xsputn (genops.c:455)
==19195== by 0x4EB32C6: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1352)
==19195== by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195== by 0x4E8F898: printf (printf.c:33)
==19195== by 0x4005CC: main (main.c:5)
==19195==
==19195== Syscall param write(buf) points to uninitialised byte(s)
==19195== at 0x4F306E0: __write_nocancel (syscall-template.S:84)
==19195== by 0x4EB2BFE: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1263)
==19195== by 0x4EB4408: new_do_write (fileops.c:518)
==19195== by 0x4EB4408: _IO_do_write@@GLIBC_2.2.5 (fileops.c:494)
==19195== by 0x4EB347C: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1331)
==19195== by 0x4E8792C: vfprintf (vfprintf.c:1663)
==19195== by 0x4E8F898: printf (printf.c:33)
==19195== by 0x4005CC: main (main.c:5)
==19195== Address 0x5203043 is 3 bytes inside a block of size 1,024 alloc'd
==19195== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==19195== by 0x4EA71D4: _IO_file_doallocate (filedoalloc.c:127)
==19195== by 0x4EB5593: _IO_doallocbuf (genops.c:398)
==19195== by 0x4EB48F7: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:820)
==19195== by 0x4EB328C: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1331)
==19195== by 0x4E8850A: vfprintf (vfprintf.c:1632)
==19195== by 0x4E8F898: printf (printf.c:33)
==19195== by 0x4005CC: main (main.c:5)
==19195==
abc?
0xfff0003f0
==19195==
==19195== HEAP SUMMARY:
==19195== in use at exit: 0 bytes in 0 blocks
==19195== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==19195==
==19195== All heap blocks were freed -- no leaks are possible
==19195==
==19195== For counts of detected and suppressed errors, rerun with: -v
==19195== Use --track-origins=yes to see where uninitialised values come from
==19195== ERROR SUMMARY: 10 errors from 5 contexts (suppressed: 0 from 0)
我认为未初始化的值/字节部分是有道理的,因为没有为终止字符'\ 0'分配内存,当我将其打印出来时,最后一个字符是垃圾值.
I think the uninitialized value/byte part makes sense because there's no memory allocated for the terminating character '\0', and when I print it out the last char is garbage value.
但是最后一条错误消息对我来说似乎是陌生的.
But the last error message looks unfamiliar to me.
我知道缓冲区大小定义为1024.我不确定是否由于内存使用效率低而导致出现此错误.
I'm aware that the buffer size is defined as 1024. I'm not sure if this error is here because of inefficient use of memory.
我还想知道堆的分配和释放是从哪里来的?这是从字符串文字中得出的吗?
Also I'm wondering where does the heap alloc and free come from? Is that from the string literal?
感谢您的帮助!
(此问题的上一个主题的措词可能令人困惑.我更改了它.)
(The previous subject of this question might be confusingly worded. I changed it. )
推荐答案
这是我对正在发生的事情的解释:
Here's my interpretation of what's going on:
您正在写入stdout
,默认情况下它已缓冲.因此,所有数据都首先进入内部缓冲区,然后被写入(丢弃")到实际的基础文件描述符中.
You're writing to stdout
, which is buffered by default. So all data goes into an internal buffer first and is then written ("flushed") to the actual underlying file descriptor.
您的a
数组不是有效的字符串,因为它缺少终止NUL字节.前几条消息来自printf
内部,它试图通过找到终止符并将内容复制到stdout
的缓冲区中来计算参数字符串的长度.由于a
中没有终结符,因此代码越界,读取未初始化的内存.
Your a
array is not a valid string, as it lacks a terminating NUL byte. The first couple of messages come from the printf
internals where it tries to compute the length of the argument string by finding the terminator and copy the contents into stdout
's buffer. As there is no terminator within a
, the code goes out of bounds, reading uninitialized memory.
此时,输出缓冲区将如下所示:
At this point the output buffer would look like:
char *buf = malloc(1024), contents:
a b c ? ? ? ?
^^^^^ ^^^^^^^
第一部分(abc
)是从a
合法复制的.下一部分是随机垃圾(a
之后的未初始化字节,复制到缓冲区中).一直进行到NUL字节恰好出现在a
之后的某个地方,然后将其视为字符串的结尾(这是从a
停止复制的位置).
The first part (abc
) was legitimately copied from a
. The next part is random garbage (uninitialized bytes after a
, copied into the buffer). This goes on until a NUL byte happens to occur somewhere after a
, which is then treated as the end of the string (this is where copying from a
stops).
最后是格式字符串中的'\n'
,它也添加到了缓冲区中:
Finally there's the '\n'
from the format string, which is also added to the buffer:
char *buf = malloc(1024), contents:
a b c ? ? ? ? \n
^^^^^ ^^^^^^^ ^^
然后(因为我们遇到了'\n'
并且stdout
被行缓冲了),我们刷新了缓冲区,调用了write(STDOUT_FILENO, buf, N)
,其中N
是输出缓冲区中正在使用很多字节的地方(至少4但是确切的数量取决于在a
之后找到'\0'
之前复制了多少垃圾字节.
Then (because we encountered a '\n'
and stdout
is line buffered) we flush the buffer, calling write(STDOUT_FILENO, buf, N)
where N
is however many bytes are in use in the output buffer (this is at least 4 but the exact number depends on how many garbage bytes were copied before a '\0'
was found after a
).
现在,错误:
==19195== Syscall param write(buf) points to uninitialised byte(s)
这就是说write
的第一个参数(缓冲区)中有未初始化的字节.
This is saying that there are uninitialized bytes within the first argument of write
(the buffer).
显然,由于源数据未初始化,因此valgrind会将输出缓冲区的某些部分视为未初始化的.将垃圾从A复制到B只是意味着B也是垃圾.
Apparently valgrind treats parts of the output buffer as uninitialized because the source data was uninitialized. Copying garbage from A to B just means B is also garbage.
==19195== Address 0x5203043 is 3 bytes inside a block of size 1,024 alloc'd
所以说有一个动态分配的缓冲区(大小为1024),并且上一个错误的uninitialised byte(s)
在偏移量3处找到了.这很有意义,因为偏移量0、1、2包含"abc"
,是完全有效的数据.但是在那之后麻烦就开始了.
So it's saying that there's a dynamically allocated buffer (of size 1024), and the uninitialised byte(s)
from the previous error were found at offset 3. Which makes sense, because offsets 0, 1, 2 contain "abc"
, which is perfectly valid data. But after that is where the trouble begins.
也就是说,该区块来自malloc
,它是从printf
间接调用的.这是因为stdout
的输出缓冲区是在您首次写入时按需创建的.这是您main
中的第一个printf
呼叫.
It's also saying that the block came from malloc
, which was called (indirectly) from printf
. This is because the output buffer of stdout
is created on demand, the first time you write to it. Which is the first printf
call in your main
.
这篇关于C:尝试分配字符串文字"abc";到大小为3的数组时,valgrind会检测到错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!