问题描述
我已经将较大的程序提炼成底部所示的代码.在valgrind中运行此程序最终将报告此错误:
I've distilled a larger program down to the code shown at the bottom. Running this program in valgrind will eventually report this error:
==7234== Invalid read of size 4
==7234== at 0x34A7275FC8: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234== by 0x34A7275EA1: new_do_write (in /usr/lib64/libc-2.15.so)
==7234== by 0x34A7276D44: _IO_do_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234== by 0x34A7278DB6: _IO_flush_all_lockp (in /usr/lib64/libc-2.15.so)
==7234== by 0x34A7278F07: _IO_cleanup (in /usr/lib64/libc-2.15.so)
==7234== by 0x34A7238BBF: __run_exit_handlers (in /usr/lib64/libc-2.15.so)
==7234== by 0x34A7238BF4: exit (in /usr/lib64/libc-2.15.so)
==7234== by 0x34A722173B: (below main) (in /usr/lib64/libc-2.15.so)
==7234== Address 0x542f2e0 is 0 bytes inside a block of size 568 free'd
==7234== at 0x4A079AE: free (vg_replace_malloc.c:427)
==7234== by 0x34A726B11C: fclose@@GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
==7234== by 0x40087C: writer (t.c:22)
==7234== by 0x34A7607D13: start_thread (in /usr/lib64/libpthread-2.15.so)
==7234== by 0x34A72F167C: clone (in /usr/lib64/libc-2.15.so)
从上面的输出中,这似乎正在发生:
From the above output, this seems to be what's happening:
- main()返回,并开始运行退出处理程序以关闭所有FILE *
- writer()线程仍在运行,唤醒,关闭FILE *
- 退出处理程序尝试访问已关闭的FILE *,该文件现在无效/free()已添加
据我所知,测试程序不会做任何未定义的事情,但是我很乐意对此做错事.
As far as I can tell, the test program doesn't do anything undefined, but I'd be happy to be wrong on that.
Valgrind与各种功能挂钩,因此很可能是valgrind错误而不是glibc.
Valgrind hooks into various functions, so it is possible it is a valgrind bug and not glibc.
- 这是glibc错误吗?
-
还是一个valgrind错误?
- is this a glibc bug ?
Or is it a valgrind bug ?
关于如何确定它是valgrind还是glibc的任何想法?
Any ideas on how to determine whether it's valgrind or glibc ?
t.c:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *test(void *arg)
{
return NULL;
}
void *writer(void *arg)
{
for(;;) {
char a[100];
FILE *f = fopen("out", "w");
if(f == NULL)
abort();
fputs("Test", f);
if(fgets(a, 100, stdin))
fputs(a, f);
fclose(f); //line 22
}
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t tid1,tid2;
pthread_create(&tid1, NULL, writer, NULL);
pthread_create(&tid2, NULL, test, NULL);
pthread_join(tid2, NULL);
//pthread_join(tid1, NULL); //no bug if we wait for writer()
return 0;
}
//compile: gcc t.c -g -pthread
可能需要花费几分钟来触发valgrind的错误,
May take several minutes to trigger an error from valgrind, with:
while [ true ]; do
echo test |valgrind --error-exitcode=2 ./a.out || break
done
环境:Fedora 17,glibc-2.15,gcc-4.7.0-5,内核3.5.3-1.fc17.x86_64,valgrind-3.7.0-4
Environment: Fedora 17, glibc-2.15, gcc-4.7.0-5, kernel 3.5.3-1.fc17.x86_64 , valgrind-3.7.0-4
推荐答案
您有比赛条件.您有一个调用exit
的线程,该线程已记录为关闭所有打开的stdio流.然后,您有另一个线程,可能在exit
将其关闭之后,才可以访问该流.关闭FILE*
后,您将无法访问它-允许指向垃圾.
You have a race condition. You have a thread that calls exit
, which is documented to close all open stdio streams. You then have another thread that, potentially after exit
has closed it, accesses such a stream. You cannot access a FILE*
after it's closed -- it is permitted to point to garbage.
如果线程执行的操作使调用exit
不安全,则必须确保不调用exit
.真的就是这么简单.
If a thread does something that makes calling exit
unsafe, you must ensure you don't call exit
. It's really that simple.
这篇关于glibc,退出时关闭FILE *之间可能存在竞争情况?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!