问题描述
我已经实现了到FDK-AAC的JNA桥.可以在此处
I have implemented a JNA bridge to FDK-AAC. Source code can be found in here
在对我的代码进行基准测试时,我可以在同一输入上获得数百次成功运行,然后偶尔发生C级崩溃,这将终止整个过程,并导致生成核心转储:
When bench-marking my code, I can get hundreds of successful runs on the same input, and then occasionally a C-level crash that'll kill the entire process, causing a core-dump to be generated:
看着核心转储,它看起来像这样:
Looking at the core dump, it looks like this:
#1 0x00007f3e92e00f5d in __GI_abort () at abort.c:90
#2 0x00007f3e92e4928d in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f3e92f70528 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007f3e92e5064a in malloc_printerr (action=<optimized out>, str=0x7f3e92f6cdee "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5426
#4 0x00007f3e92e5304a in _int_free (av=0x7f3de0000020, p=<optimized out>, have_lock=0) at malloc.c:4337
#5 0x00007f3e92e5744e in __GI___libc_free (mem=<optimized out>) at malloc.c:3145
#6 0x00007f3e113921e9 in FDKfree (ptr=0x7f3de009df60) at libSYS/src/genericStds.cpp:233
#7 0x00007f3e1130d7d3 in Free_AacEncoder (p=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:407
#8 0x00007f3e1130fbb3 in aacEncClose (phAacEncoder=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:1395
如果我重复执行了足够多次基准测试,则此返回/堆栈跟踪错误是可重现的,尽管我很难理解导致这种错误的原因是什么?分配给指针0x7f3de009df60
的内存也分配在CPP/C代码内部,我可以保证释放分配的同一实例.基准当然是单线程的.
This back/stack trace error is reproducible if I run repeat benchmark enough times , though I'm having a hard time understanding what might be the cause for such error? Memory allocated to pointer 0x7f3de009df60
is allocated inside the CPP/C code as well and I can guarantee the same instance that's allocated is being freed. The benchmark is, of course - single-threaded.
阅读这些内容之后:
security checks &&internal functions
我仍然很难理解-什么可能是导致我得到上述错误的真实(非开发性的,而是错误的)场景?为什么它很少发生?
I'm still having a hard time understanding - what might be a real (non-exploitation, but rather error)) scenario that causes me to get the above error? and why does it happen very scarcely?
当前怀疑:
运行详细的回溯,我得到以下输入:
Running a detailed backtrace, I get this input:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
set = {__val = {4, 6378670679680, 645636045657660056, 90523359816, 139904561311072, 292199584, 139903730612120, 139903730611784, 139904561311088, 1460617926600, 47573685816, 4119199860131166208,
139904593745464, 139904553224483, 139904561311136, 288245657}}
pid = <optimized out>
tid = <optimized out>
#1 0x00007f3e92e00f5d in __GI_abort () at abort.c:90
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x7f3de026db10, sa_sigaction = 0x7f3de026db10}, sa_mask = {__val = {139903730540556, 19, 30064771092, 812522497172832284, 139903728706672, 1887866374039011357,
139900298780168, 3775732748407067896, 763430436865, 35180077121538, 4119199860131166208, 139904561311552, 139904553065676, 1, 139904561311584, 139904561312192}}, sa_flags = 4096,
sa_restorer = 0x14}
sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x00007f3e92e4928d in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f3e92f70528 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:181
ap = {{gp_offset = 40, fp_offset = 32574, overflow_arg_area = 0x7f3e11adf1d0, reg_save_area = 0x7f3e11adf160}}
fd = <optimized out>
list = <optimized out>
nlist = <optimized out>
cp = <optimized out>
written = <optimized out>
#3 0x00007f3e92e5064a in malloc_printerr (action=<optimized out>, str=0x7f3e92f6cdee "corrupted size vs. prev_size", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5426
buf = "00007f3de009e9f0"
cp = <optimized out>
ar_ptr = <optimized out>
ptr = <optimized out>
str = 0x7f3e92f6cdee "corrupted size vs. prev_size"
action = <optimized out>
#4 0x00007f3e92e5304a in _int_free (av=0x7f3de0000020, p=<optimized out>, have_lock=0) at malloc.c:4337
size = 2720
fb = <optimized out>
nextchunk = 0x7f3de009e9f0
nextsize = 736
nextinuse = <optimized out>
prevsize = <optimized out>
bck = <optimized out>
fwd = <optimized out>
errstr = 0x0
locked = <optimized out>
#5 0x00007f3e92e5744e in __GI___libc_free (mem=<optimized out>) at malloc.c:3145
ar_ptr = <optimized out>
p = <optimized out>
hook = <optimized out>
#6 0x00007f3e113921e9 in FDKfree (ptr=0x7f3de009df60) at libSYS/src/genericStds.cpp:233
No locals.
#7 0x00007f3e1130d7d3 in Free_AacEncoder (p=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:407
No locals.
#8 0x00007f3e1130fbb3 in aacEncClose (phAacEncoder=0x7f3de0115740) at libAACenc/src/aacenc_lib.cpp:1395
hAacEncoder = 0x7f3de009df60
err = AACENC_OK
- 在框架#6 中,您可以看到问题中的指针是
0x7f3de009df60
. - 在帧#4 中,您可以看到大小为2720,这确实是所发布结构的预期大小.
- 但是
nextchunk
的地址是0x7f3de009e9f0
,该地址仅在当前指针释放之后2704个字节. - 我可以确认在重现错误时总是如此.
- 这是否可以很好地表明我所面临的错误?
- In frame #6, you can see the pointer in questions is
0x7f3de009df60
. - In frame #4, you can see that the size is 2720, which is indeed the expected size of the structure being released.
- However the address of
nextchunk
is0x7f3de009e9f0
, which is only 2704 bytes after the current pointer which is being released. - I can confirm this is always the case when the error reproduces.
- Could this be a strong indication of the error I'm facing ??
推荐答案
好的,所以我设法解决了这个问题.
OK, so I've managed to overcome this issue.
首先-大小与prev_size损坏"的实际原因非常简单-相邻后续块中的内存块控制结构字段由于代码的越界访问而被覆盖.如果您为指针p
分配了x
字节,但是就同一指针而言超出了x
的写入范围,则可能会收到此错误,表明当前内存分配(块)大小与以下内容不同:下一个块控制结构(由于它被覆盖).
First of all - A practical cause to "corrupted size vs. prev_size" is quite simple - memory chunk control structure fields in the adjacent following chunk are being overwritten due to out-of-bounds access by the code. if you allocate x
bytes for pointer p
but wind up writing beyond x
in regards to the same pointer, you might get this error, indicating the current memory allocation (chunk) size is not the same as what's found in the next chunk control structure (due to it being overwritten).
对于造成此内存泄漏的原因-在Java/JNA层中完成的结构映射隐含了与dll/so编译源不同的#pragma
相关填充/对齐方式.反过来,这导致数据被写入超出分配的结构边界的位置.禁用该对齐会使问题消失. (数千次执行没有一次崩溃!).
As for the cause for this memory leak - structure mapping done in the Java/JNA layer implied different #pragma
related padding/alignment from what dll/so was compiled with. This in turn, caused data to be written beyond the allocated structure boundary. Disabling that alignment made the issues go away. (Thousands of executions without a single crash!).
这篇关于了解“损坏的大小与prev_size". glibc错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!