问题描述
项目
我在写一个Java命令行界面的使用Java本地接口的内部网络和网络测试工具的C库。在C code(我没写)是复杂的,水平低,往往比特级操纵内存,并且只使用原始套接字。从C端(在后台运行的pthreads)以及Java端(运行调用本地code线程ScheduledThreadPoolExecutors)应用程序是多线程的。这就是说,C库应该是基本稳定。在Java和JNI接口code,因为它的出现,导致的问题。
问题(S)
应用程序崩溃,在进入分段错误到本机C函数。这仅发生在程序处于特定状态(即成功运行特定的本地函数导致另一个特定的本地函数下一次调用段错误)。此外,该应用程序崩溃与外观类似的段错误时,退出
命令发出,但同样,只有在成功运行相同的特定的本地函数。
我是一个没有经验的C语言开发和经验丰富的Java开发人员 - 我已经习惯了崩溃给了我一个具体的理由和特定的行号。我不得不在这种情况下工作,是 hs_err_pid * .LOG
输出和核心转储。我已经包括了我所能做这个问题的结束。
我的工作至今
- 当然,我想找到code的具体行,其中的事故发生。我放在
的System.out.println()
右键在Java端的本地通话和前的printf()
因为那里的程序崩溃是一定要使用fflush(标准输出)
后直接本机函数的第一行。在的System.out
通话跑和的printf
电话都没有。这告诉我,该段错误发生在进入功能 - 这是我以前从未见过 - 我三重检查的参数的功能,以确保他们不会采取行动了。但是,我只能传递一个参数(类型为
jint
)。另外两个(的JNIEnv * ENV,jobject j_object
)是JNI构造和在我的掌握。 - 我注释掉的功能,每一行,只留下一个
返回0;
结尾。该段错误还是发生了。这使我认为,问题不在此功能。 - 我跑在不同的顺序命令(有效运行的本地函数不同的订单)。当一个特定的本地函数是在崩溃的函数调用之前运行的段错误只发生。在运行时此特定的功能似乎可正常使用。
- 我打印的
ENV
指针的值和值&放大器; j_object
接近这个其他的结束功能,以确保我没有以某种方式对其造成损坏。我不知道我是否损坏它们,但都具有在退出功能非零值。 - 修改1:的通常情况下,同样的功能在许多线程运行(通常不同时,但它应该是线程安全)。我跑从主线程功能,不活跃的任何其他线程,以确保在多线程Java方面并没有造成问题。这不是,我得到了相同的段错误。
这一切困扰我。为什么它仍然段错误,如果我注释掉整体功能,除了return语句?如果问题是在这等功能,为什么没有失败呢?如果这是一个问题,其中第一功能弄乱了内存和第二个函数非法访问损坏内存,为什么不如果与非法访问行失败,而不是在进入功能?
如果你看到一个互联网文章如果有人解释类似地雷问题,请评论吧。有这么多段错误的文章,并没有似乎包含这个特定的问题。同上,用于做题。该问题也可能是因为我没有足够的经验来应用一个抽象的解决这个问题。
我的问题
的哪些原因会导致(C语言)一个Java本机的功能在这样的条目段错误?我可以看看具体是什么东西,这将有助于我壁球这个bug?我怎么能写在未来code,这将帮助我避免这个问题?的
有用的信息
有关记录,我实际上不能张贴code。如果你觉得code的描述将是有益的,评论,我会在编辑它。
错误信息
#
#已被Java运行时环境检测到一个致命错误:
#
#SIGSEGV(0XB)在PC = 0x00002aaaaaf6d9c3,PID = 2185,TID = 1086892352
#
#JRE版本:6.0_21-B06
#Java虚拟机:Java的热点(TM)64位服务器VM(17.0-B16混合模式Linux的AMD64)
#有问题的框架:
#Ĵpath.to.my.Object.native_function_name(I)I + 0
#
#更多信息的错误报告文件保存为:
#/path/to/hs_err_pid2185.log
#
#如果您想提交错误报告,请访问:
#http://java.sun.com/webapps/bugreport/crash.jsp
#该事故发生在Java虚拟机之外的本土code。
#见何处报告错误问题的框架。
#
的重要位 hs_err_pid * .LOG
文件
---------------牛逼^ h - [RêA D ---------------当前线程(0x000000004fd13800):JavaThread池1线程1[_thread_in_native,ID = 2198,栈(0x0000000040b8a000,0x0000000040c8b000)SIGINFO:si_signo = SIGSEGV:si_errno = 0,SI_ code = 128(),si_addr = 0x0000000000000000寄存器:
RAX = 0x34372e302e3095e1,RBX = 0x00002aaaae39dcd0,RCX = 0x0000000000000000,RDX = 0x0000000000000000
RSP = 0x0000000040c89870,RBP = 0x0000000040c898c0,RSI = 0x0000000040c898e8,RDI = 0x000000004fd139c8
R8 = 0x000000004fb631f0,R9 = 0x000000004faf5d30,R10 = 0x00002aaaaaf6d999,R11 = 0x00002b1243b39580
R12 = 0x00002aaaae3706d0,R13 = 0x00002aaaae39dcd0,R14 = 0x0000000040c898e8,R15 = 0x000000004fd13800
RIP = 0x00002aaaaaf6d9c3,EFL = 0x0000000000010202,CSGSFS = 0x0000000000000033,ERR = 0x0000000000000000
TRAPNO = 0x000000000000000d堆栈:[0x0000000040b8a000,0x0000000040c8b000],SP = 0x0000000040c89870,自由空间= 3fe0000000000000018k
本机框架:(J =已编译的Java code,J =间preTED,VV = VM code,C =本地code)
Ĵpath.to.my.Object.native_function_name(I)I + 0
Ĵpath.to.my.Object $ CustomThread.fire()V + 18
Ĵpath.to.my.CustomThreadSuperClass.run()V + 1
Ĵjava.util.concurrent.Executors $ RunnableAdapter.call()Ljava /郎/对象; +4
Ĵjava.util.concurrent.FutureTask中$ Sync.innerRun()V + 30
Ĵjava.util.concurrent.FutureTask.run()V + 4
Ĵ java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
Ĵjava.util.concurrent.ScheduledThreadPoolExecutor中的$ ScheduledFutureTask.run()V + 15
Ĵjava.util.concurrent.ThreadPoolExecutor中的$ Worker.runTask(Ljava /朗/ Runnable接口;)V + 59
Ĵjava.util.concurrent.ThreadPoolExecutor中的$ Worker.run()V + 28
Ĵjava.lang.Thread.run()V + 11
V〜StubRoutines :: call_stub
V [libjvm.so + 0x3e756d]
V [libjvm.so + 0x5f6f59]
V [libjvm.so + 0x3e6e39]
V [libjvm.so + 0x3e6eeb]
V [libjvm.so + 0x476387]
V [libjvm.so + 0x6ee452]
V [libjvm.so + 0x5f80df]Java的框架:(J =已编译的Java code,J =间preTED,VV = VM code)
Ĵpath.to.my.Object.native_function_name(I)I + 0
Ĵpath.to.my.Object $ CustomThread.fire()V + 18
Ĵpath.to.my.CustomThreadSuperClass.run()V + 1
Ĵjava.util.concurrent.Executors $ RunnableAdapter.call()Ljava /郎/对象; +4
Ĵjava.util.concurrent.FutureTask中$ Sync.innerRun()V + 30
Ĵjava.util.concurrent.FutureTask.run()V + 4
Ĵ java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
Ĵjava.util.concurrent.ScheduledThreadPoolExecutor中的$ ScheduledFutureTask.run()V + 15
Ĵjava.util.concurrent.ThreadPoolExecutor中的$ Worker.runTask(Ljava /朗/ Runnable接口;)V + 59
Ĵjava.util.concurrent.ThreadPoolExecutor中的$ Worker.run()V + 28
Ĵjava.lang.Thread.run()V + 11
V〜StubRoutines :: call_stub--------------- P,R 0权证šš---------------Java线程:(=>当前线程)
0x000000004fabc800 JavaThread池1线程-6[_thread_new,ID = 2203,栈(0x0000000000000000,0x0000000000000000)
0x000000004fbcb000 JavaThread池1线程-5[_thread_blocked,ID = 2202,栈(0x0000000042c13000,0x0000000042d14000)
0x000000004fbc9800 JavaThread池1线程4[_thread_blocked,ID = 2201,栈(0x0000000042b12000,0x0000000042c13000)
0x000000004fbc7800 JavaThread池1线程3[_thread_blocked,ID = 2200,栈(0x0000000042a11000,0x0000000042b12000)
0x000000004fc54800 JavaThread池1线程2[_thread_blocked,ID = 2199,栈(0x0000000042910000,0x0000000042a11000)
= GT; 0x000000004fd13800 JavaThread池1线程1[_thread_in_native,ID = 2198,栈(0x0000000040b8a000,0x0000000040c8b000)
0x000000004fb04800 JavaThread低内存探测器守护[_thread_blocked,ID = 2194,栈(0x0000000041d0d000,0x0000000041e0e000)
0x000000004fb02000 JavaThreadCompilerThread1守护[_thread_blocked,ID = 2193,栈(0x0000000041c0c000,0x0000000041d0d000)
0x000000004fafc800 JavaThreadCompilerThread0守护[_thread_blocked,ID = 2192,栈(0x0000000040572000,0x0000000040673000)
0x000000004fafa800 JavaThread信号调度员守护[_thread_blocked,ID = 2191,栈(0x0000000040471000,0x0000000040572000)
0x000000004fad6000 JavaThread终结守护[_thread_blocked,ID = 2190,栈(0x0000000041119000,0x000000004121a000)
0x000000004fad4000 JavaThread参考处理程序守护[_thread_blocked,ID = 2189,栈(0x0000000041018000,0x0000000041119000)
0x000000004fa51000 JavaThread主[_thread_in_vm,ID = 2186,栈(0x00000000418cc000,0x00000000419cd000)其他主题:
0x000000004facf800 VMThread [堆栈:0x0000000040f17000,0x0000000041018000] [ID = 2188]
0x000000004fb0f000 WatcherThread [堆栈:0x0000000041e0e000,0x0000000041f0f000] [ID = 2195]VM的状态:没有还原点(正常执行)VM互斥/监视当前由一个线程拥有:无堆
PSYoungGen总305856K,使用31465K [0x00002aaadded0000,0x00002aaaf3420000,0x00002aaaf3420000)
伊甸园空间262208K,用12%[0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaedee0000)
从太空43648K,使用0%[0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)
空间43648K,使用0%[0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)
PSOldGen总699072K,使用0K [0x00002aaab3420000,0x00002aaadded0000,0x00002aaadded0000)
对象空间699072K,使用0%[0x00002aaab3420000,0x00002aaab3420000,0x00002aaadded0000)
PSPermGen总21248K,使用3741K [0x00002aaaae020000,0x00002aaaaf4e0000,0x00002aaab3420000)
对象空间21248K,用17%[0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)
VM参数:
jvm_args:-Xms1024m -Xmx1024m -XX:+ UseParallelGC
--------------- S y时式TË中号---------------操作系统:红帽企业Linux客户端5.5版本(Tikanga)UNAME:Linux的2.6.18-194.8.1.el5#1 SMP周三6月23日10时52分51秒EDT 2010 x86_64的
libc中:2.5的glibc 2.5 NPTL
RLIMIT:STACK 10240k,CORE 102400K,NPROC 10000,NOFILE 1024,AS无限
负载平均值:0.21 0.08 0.05CPU:共1(每个CPU核心1,核心每线程1)家族6模型26踩着4,CMOV,CX8,FXSR,MMX,SSE,SSE2,SSE3,SSSE3,SSE4.1,SSE4.2,POPCNT内存:4K页,物理3913532k(1537020k免费),交换1494004k(1494004k免费)vm_info:Java的热点(TM)64位服务器VM(17.0-B16)用于Linux的AMD64 JRE(1.6.0_21-B06),通过java_re建于2010年6月22日1点10分零零秒用gcc 3.2.2(在SuSE Linux)时间:星期二10月15日15时08分13秒2013
经过时间:13秒
Valgrind的输出
我真的不知道如何正确使用Valgrind的。这是运行在什么想出了 Valgrind的应用ARG1
== == 2184
== == 2184 HEAP摘要:
== == 2184使用在出口处:在444块16914字节
== == 2184总堆的使用情况:673 allocs,229的FreeS,32931字节分配
== == 2184
== == 2184泄漏摘要:
== == 2184肯定丢失:0字节0块
== == 2184失去了间接:0字节0块
== == 2184可能丢失:0字节0块
== == 2184到达尚:在444块16914字节
== == 2184燮pressed:0字节0块
== == 2184与重新运行--leak检查=全看到内存泄露的细节
== == 2184
== == 2184对于检测燮pressed错误计数,重新运行:-v
== == 2184错误摘要:从0 0上下文错误(SUP pressed:7 7)
的编辑2:的
GDB输出和回溯
我GDB跑到底。我确信,C库是用 -g
标志编译的。
$`GDB这java`
GNU GDB(GDB)的红帽企业Linux(7.0.1-23.el5)
版权所有(C)2009自由软件基金会
许可GPLv3的+:GNU GPL第3版或更高版本< HTTP://gnu.org/licenses/gpl.html>
这是自由软件:您可以自由修改和重新发布。
没有担保,在法律允许的范围内。键入显示复制
和显示保修的说明。
这是GDB配置为x86_64的-红帽Linux的GNU的。
对于错误报告的说明,请参阅:
< HTTP://www.gnu.org/software/gdb/bugs/> ...
从读)发现/usr/bin/java...(no调试符号的符号......完成。
(GDB)运行-jar /opt/scts/scts.jar test.config
启动程序:在/ usr / bin中/ Java的罐子/opt/scts/scts.jar test.config
[使用线程调试libthread_db所启用]
执行新的方案:/usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java
[使用线程调试libthread_db所启用]
[新主题0x4022c940(LWP 3241)]
[新主题0x4032d940(LWP 3242)]
[新主题0x4042e940(LWP 3243)]
[新主题0x4052f940(LWP 3244)]
[新主题0x40630940(LWP 3245)]
[新主题0x40731940(LWP 3246)]
[新主题0x40832940(LWP 3247)]
[新主题0x40933940(LWP 3248)]
[新主题0x40a34940(LWP 3249)]
...我的程序做了一些工作,并启动一个后台线程...
[新主题0x41435940(LWP 3250)]
...我键入似乎导致在下一个命令段错误的命令;新线预计...
[新主题0x41536940(LWP 3252)]
[新主题0x41637940(LWP 3253)]
[新主题0x41738940(LWP 3254)]
[新主题0x41839940(LWP 3255)]
[新主题0x4193a940(LWP 3256)]
... I型,实际上触发段错误的命令。新线程预期,因为该函数是在它自己的线程中运行。如果它没有段错误,它会产生相同数量的线程作为previous命令...
[新主题0x41a3b940(LWP 3257)]计划接收信号SIGSEGV,分割过错。
[切换主题0x41839940(LWP 3255)]
0x00002aaaabcaec45在?? ()
...我疯狂地通过gdb的帮助下阅读,然后运行回溯...
(GDB)BT
#0 0x00002aaaabcaec45的? ()
#1 0x00002aaaf3ad7800的? ()
#2 0x00002aaaf3ad81e8的? ()
#3 0x0000000041838600的? ()
#4 0x00002aaaeacddcd0的? ()
#5 0x0000000041838668的? ()
#6 0x00002aaaeace23f0的? ()
#7 0x0000000000000000的? ()
...难道不应该有符号,如果我与 -g
编译?我这样做,根据从输出线制作
:
的gcc -g -fPIC -Wall -c -I ...
GCC -g -shared -W1,SONAME,...
看起来像我已经解决了这个问题,我将概述这里他人的利益。
发生了什么
分段错误的原因是,我用的sprintf()
来赋值给一个的char *
它没有被赋值指针。这里是坏code:
的char * ip_to_string(uint32_t的IP)
{
unsigned char型字节[4];
字节[0] = IP&放大器; 0xFF的;
字节[1] =(IP>→8)及0xFF的;
字节[2] =(IP>> 16)及0xFF的;
字节[3] =(IP>> 24)及0xFF的; 字符* ip_string;
的sprintf(ip_string,字节[0],字节[1],字节[2],字节[3]%D%D。);
返回ip_string;
}
指针 ip_string
不这里有一个值,这意味着它指向什么。但,这并不完全正确。它所指向的是的未定义的。它可以指向任何地方。因此,在与 sprintf的赋值给它()
,我无意中重写内存的随机位。我认为,对于古怪行为的原因(虽然我从来没有证实了这一点)是未定义的指针指着堆栈上的某个地方。这导致计算机混淆当某些函数的调用。
解决这个问题的方法之一是分配内存,然后将指针指向内存,可与来实现的malloc()
。该解决方案将类似于这样:
的char * ip_to_string(uint32_t的IP)
{
unsigned char型字节[4];
字节[0] = IP&放大器; 0xFF的;
字节[1] =(IP>→8)及0xFF的;
字节[2] =(IP>> 16)及0xFF的;
字节[3] =(IP>> 24)及0xFF的; 字符* ip_string =的malloc(16);
的sprintf(ip_string,字节[0],字节[1],字节[2],字节[3]%D%D。);
返回ip_string;
}
这样做的问题是,每的malloc()
需要通过调用匹配到免费()
,或者你有一个内存泄漏。如果我称之为免费(ip_string)
这个函数返回的指针将是无用的,如果我不那么我必须依靠code,它在呼唤这里面函数释放内存,这为pretty危险。
据我所知,正确的解决方案,这是一个已分配的指针传递给函数,这样它的功能是负责填补针对性的记忆。这样一来,调用的malloc()
和免费()
可在code块进行。更安全。这里的新功能:
的char * ip_to_string(uint32_t的IP,字符* ip_string)
{
unsigned char型字节[4];
字节[0] = IP&放大器; 0xFF的;
字节[1] =(IP>→8)及0xFF的;
字节[2] =(IP>> 16)及0xFF的;
字节[3] =(IP>> 24)及0xFF的; 的sprintf(ip_string,字节[0],字节[1],字节[2],字节[3]%D%D。);
返回ip_string;
}
问题的答案
的哪些原因会导致(C语言)一个Java本机功能,在进入段错误这样?的
如果您分配一个值尚未分配的内存的指针,你可能会意外地在堆栈上覆盖内存。这可能不会导致立即失败,但是,当你以后调用其他功能可能会导致问题。
的我可以看看具体是什么东西,这将有助于我壁球这种错误?的
查找分割故障像任何其他。事情是这样值分配给未分配的内存或提领一空指针。我不是这方面的专家,但我敢打赌,有<一个href=\"https://www.google.ca/search?q=how+to+find+a+segfault&oq=how+to+find+a+segfault&aqs=chrome..69i57j0.3481j0j7&sourceid=chrome&espv=210&es_sm=119&ie=UTF-8\">many网络资源了解这一点。
的我怎么能写在未来code,这将帮助我避免这个问题?的
要小心使用指针,尤其是当你负责创建他们。如果你看到一行code的,看起来像这样:
键入*变量;
...然后寻找一条线,看起来像...
变量= ...;
...,并确保这条线来写所指向的内存了。
The Project
I'm writing a Java command line interface to a C library of internal networking and network testing tools using the Java Native Interface. The C code (which I didn't write) is complex and low level, often manipulates memory at the bit level, and uses raw sockets exclusively. The application is multi-threaded from the C side (pthreads running in the background) as well as the Java side (ScheduledThreadPoolExecutors running threads that call native code). That said, the C library should be mostly stable. The Java and JNI interface code, as it turns out, is causing problems.
The Problem(s)
The application crashes with a segmentation fault upon entry into a native C function. This only happens when the program is in a specific state (i.e. successfully running a specific native function causes the next call to another specific native function to segfault). Additionally, the application crashes with a similar-looking segfault when the quit
command is issued, but again, only after successfully running that same specific native function.
I'm an inexperienced C developer and an experienced Java developer -- I'm used to crashes giving me a specific reason and a specific line number. All I have to work from in this case is the hs_err_pid*.log
output and the core dump. I've included what I could at the end of this question.
My Work So Far
- Naturally, I wanted to find the specific line of code where the crash happened. I placed a
System.out.println()
right before the native call on the Java side and aprintf()
as the first line of the native function where the program crashes being sure to usefflush(stdout)
directly after. TheSystem.out
call ran and theprintf
call didn't. This tells me that the segfault happened upon entry into the function -- something I've never seen before. - I triple checked the parameters to the function, to ensure that they wouldn't act up. However, I only pass one parameter (of type
jint
). The other two (JNIEnv *env, jobject j_object
) are JNI constructs and out of my control. - I commented out every single line in the function, leaving only a
return 0;
at the end. The segfault still happened. This leads me to believe that the problem is not in this function. - I ran the command in different orders (effectively running the native functions different orders). The segfaults only happen when one specific native function is run before the crashing function call. This specific function appears to behave properly when it is run.
- I printed the value of the
env
pointer and the value of&j_object
near the end of this other function, to ensure that I didn't somehow corrupt them. I don't know if I corrupted them, but both have non-zero values upon exiting the function. - Edit 1: Typically, the same function is run in many threads (not usually concurrently, but it should be thread safe). I ran the function from the main thread without any other threads active to ensure that multithreading on the Java side wasn't causing the issue. It wasn't, and I got the same segfault.
All of this perplexes me. Why is does it still segfault if I comment out the whole function, except for the return statement? If the problem is in this other function, why doesn't it fail there? If it's a problem where the first function messes up the memory and the second function illegally accesses the corrupt memory, why doesn't if fail on the line with the illegal access, rather than on entry to the function?
If you see an internet article where someone explains a problem similar to mine, please comment it. There are so many segfault articles, and none seem to contain this specific problem. Ditto for SO questions. The problem may also be that I'm not experienced enough to apply an abstract solution to this problem.
My Question
What can cause a Java native function (in C) to segfault upon entry like this? What specific things can I look for that will help me squash this bug? How can I write code in the future that will help me avoid this problem?
Helpful Info
For the record, I can't actually post the code. If you think a description of the code would be helpful, comment and I'll edit it in.
Error Message
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00002aaaaaf6d9c3, pid=2185, tid=1086892352
#
# JRE version: 6.0_21-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
# Problematic frame:
# j path.to.my.Object.native_function_name(I)I+0
#
# An error report file with more information is saved as:
# /path/to/hs_err_pid2185.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
The Important Bits of the hs_err_pid*.log
File
--------------- T H R E A D ---------------
Current thread (0x000000004fd13800): JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000
Registers:
RAX=0x34372e302e3095e1, RBX=0x00002aaaae39dcd0, RCX=0x0000000000000000, RDX=0x0000000000000000
RSP=0x0000000040c89870, RBP=0x0000000040c898c0, RSI=0x0000000040c898e8, RDI=0x000000004fd139c8
R8 =0x000000004fb631f0, R9 =0x000000004faf5d30, R10=0x00002aaaaaf6d999, R11=0x00002b1243b39580
R12=0x00002aaaae3706d0, R13=0x00002aaaae39dcd0, R14=0x0000000040c898e8, R15=0x000000004fd13800
RIP=0x00002aaaaaf6d9c3, EFL=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
TRAPNO=0x000000000000000d
Stack: [0x0000000040b8a000,0x0000000040c8b000], sp=0x0000000040c89870, free space=3fe0000000000000018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
j path.to.my.Object.native_function_name(I)I+0
j path.to.my.Object$CustomThread.fire()V+18
j path.to.my.CustomThreadSuperClass.run()V+1
j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j java.util.concurrent.FutureTask$Sync.innerRun()V+30
j java.util.concurrent.FutureTask.run()V+4
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x3e756d]
V [libjvm.so+0x5f6f59]
V [libjvm.so+0x3e6e39]
V [libjvm.so+0x3e6eeb]
V [libjvm.so+0x476387]
V [libjvm.so+0x6ee452]
V [libjvm.so+0x5f80df]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j path.to.my.Object.native_function_name(I)I+0
j path.to.my.Object$CustomThread.fire()V+18
j path.to.my.CustomThreadSuperClass.run()V+1
j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j java.util.concurrent.FutureTask$Sync.innerRun()V+30
j java.util.concurrent.FutureTask.run()V+4
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
--------------- P R O C E S S ---------------
Java Threads: ( => current thread )
0x000000004fabc800 JavaThread "pool-1-thread-6" [_thread_new, id=2203, stack(0x0000000000000000,0x0000000000000000)]
0x000000004fbcb000 JavaThread "pool-1-thread-5" [_thread_blocked, id=2202, stack(0x0000000042c13000,0x0000000042d14000)]
0x000000004fbc9800 JavaThread "pool-1-thread-4" [_thread_blocked, id=2201, stack(0x0000000042b12000,0x0000000042c13000)]
0x000000004fbc7800 JavaThread "pool-1-thread-3" [_thread_blocked, id=2200, stack(0x0000000042a11000,0x0000000042b12000)]
0x000000004fc54800 JavaThread "pool-1-thread-2" [_thread_blocked, id=2199, stack(0x0000000042910000,0x0000000042a11000)]
=>0x000000004fd13800 JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]
0x000000004fb04800 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=2194, stack(0x0000000041d0d000,0x0000000041e0e000)]
0x000000004fb02000 JavaThread "CompilerThread1" daemon [_thread_blocked, id=2193, stack(0x0000000041c0c000,0x0000000041d0d000)]
0x000000004fafc800 JavaThread "CompilerThread0" daemon [_thread_blocked, id=2192, stack(0x0000000040572000,0x0000000040673000)]
0x000000004fafa800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=2191, stack(0x0000000040471000,0x0000000040572000)]
0x000000004fad6000 JavaThread "Finalizer" daemon [_thread_blocked, id=2190, stack(0x0000000041119000,0x000000004121a000)]
0x000000004fad4000 JavaThread "Reference Handler" daemon [_thread_blocked, id=2189, stack(0x0000000041018000,0x0000000041119000)]
0x000000004fa51000 JavaThread "main" [_thread_in_vm, id=2186, stack(0x00000000418cc000,0x00000000419cd000)]
Other Threads:
0x000000004facf800 VMThread [stack: 0x0000000040f17000,0x0000000041018000] [id=2188]
0x000000004fb0f000 WatcherThread [stack: 0x0000000041e0e000,0x0000000041f0f000] [id=2195]
VM state:not at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: None
Heap
PSYoungGen total 305856K, used 31465K [0x00002aaadded0000, 0x00002aaaf3420000, 0x00002aaaf3420000)
eden space 262208K, 12% used [0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaedee0000)
from space 43648K, 0% used [0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)
to space 43648K, 0% used [0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)
PSOldGen total 699072K, used 0K [0x00002aaab3420000, 0x00002aaadded0000, 0x00002aaadded0000)
object space 699072K, 0% used [0x00002aaab3420000,0x00002aaab3420000,0x00002aaadded0000)
PSPermGen total 21248K, used 3741K [0x00002aaaae020000, 0x00002aaaaf4e0000, 0x00002aaab3420000)
object space 21248K, 17% used [0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)
VM Arguments:
jvm_args: -Xms1024m -Xmx1024m -XX:+UseParallelGC
--------------- S Y S T E M ---------------
OS:Red Hat Enterprise Linux Client release 5.5 (Tikanga)
uname:Linux 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64
libc:glibc 2.5 NPTL 2.5
rlimit: STACK 10240k, CORE 102400k, NPROC 10000, NOFILE 1024, AS infinity
load average:0.21 0.08 0.05
CPU:total 1 (1 cores per cpu, 1 threads per core) family 6 model 26 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt
Memory: 4k page, physical 3913532k(1537020k free), swap 1494004k(1494004k free)
vm_info: Java HotSpot(TM) 64-Bit Server VM (17.0-b16) for linux-amd64 JRE (1.6.0_21-b06), built on Jun 22 2010 01:10:00 by "java_re" with gcc 3.2.2 (SuSE Linux)
time: Tue Oct 15 15:08:13 2013
elapsed time: 13 seconds
Valgrind Output
I don't really know how to use Valgrind properly. This is what came up when running valgrind app arg1
==2184==
==2184== HEAP SUMMARY:
==2184== in use at exit: 16,914 bytes in 444 blocks
==2184== total heap usage: 673 allocs, 229 frees, 32,931 bytes allocated
==2184==
==2184== LEAK SUMMARY:
==2184== definitely lost: 0 bytes in 0 blocks
==2184== indirectly lost: 0 bytes in 0 blocks
==2184== possibly lost: 0 bytes in 0 blocks
==2184== still reachable: 16,914 bytes in 444 blocks
==2184== suppressed: 0 bytes in 0 blocks
==2184== Rerun with --leak-check=full to see details of leaked memory
==2184==
==2184== For counts of detected and suppressed errors, rerun with: -v
==2184== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)
Edit 2:
GDB Output and Backtrace
I ran it through with GDB. I made sure that the C library was compiled with the -g
flag.
$ gdb `which java`
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.
(gdb) run -jar /opt/scts/scts.jar test.config
Starting program: /usr/bin/java -jar /opt/scts/scts.jar test.config
[Thread debugging using libthread_db enabled]
Executing new program: /usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java
[Thread debugging using libthread_db enabled]
[New Thread 0x4022c940 (LWP 3241)]
[New Thread 0x4032d940 (LWP 3242)]
[New Thread 0x4042e940 (LWP 3243)]
[New Thread 0x4052f940 (LWP 3244)]
[New Thread 0x40630940 (LWP 3245)]
[New Thread 0x40731940 (LWP 3246)]
[New Thread 0x40832940 (LWP 3247)]
[New Thread 0x40933940 (LWP 3248)]
[New Thread 0x40a34940 (LWP 3249)]
... my program does some work, and starts a background thread ...
[New Thread 0x41435940 (LWP 3250)]
... I type the command that seems to cause the segfault on the next command; the new threads are expected ...
[New Thread 0x41536940 (LWP 3252)]
[New Thread 0x41637940 (LWP 3253)]
[New Thread 0x41738940 (LWP 3254)]
[New Thread 0x41839940 (LWP 3255)]
[New Thread 0x4193a940 (LWP 3256)]
... I type the command that actually triggers the segfault. The new thread is expected, since the function is run in its own thread. If it did not segfault, it would have created the same number of thread as the previous command ...
[New Thread 0x41a3b940 (LWP 3257)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41839940 (LWP 3255)]
0x00002aaaabcaec45 in ?? ()
... I furiously read through the gdb help, then run the backtrace ...
(gdb) bt
#0 0x00002aaaabcaec45 in ?? ()
#1 0x00002aaaf3ad7800 in ?? ()
#2 0x00002aaaf3ad81e8 in ?? ()
#3 0x0000000041838600 in ?? ()
#4 0x00002aaaeacddcd0 in ?? ()
#5 0x0000000041838668 in ?? ()
#6 0x00002aaaeace23f0 in ?? ()
#7 0x0000000000000000 in ?? ()
... Shouldn't that have symbols if I compiled with -g
? I did, according to the lines from the output of make
:
gcc -g -Wall -fPIC -c -I ...
gcc -g -shared -W1,soname, ...
Looks like I've solved the issue, which I'll outline here for the benefit of others.
What Happened
The cause of the segmentation fault was that I used sprintf()
to assign a value to a char *
pointer which had not been assigned a value. Here is the bad code:
char* ip_to_string(uint32_t ip)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
char *ip_string;
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
The pointer ip_string
does not have a value here, which means it points to nothing. Except, that's not entirely true. What it points to is undefined. It could point anywhere. So in assigning a value to it with sprintf()
, I inadvertently overwrote a random bit of memory. I believe that the reason for the odd behaviour (though I never confirmed this) was that the undefined pointer was pointing to somewhere on the stack. This caused the computer to be confused when certain functions were called.
One way to fix this is to allocate memory and then point the pointer to that memory, which can be accomplished with malloc()
. That solution would look similar to this:
char* ip_to_string(uint32_t ip)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
char *ip_string = malloc(16);
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
The problem with this is that every malloc()
needs to be matched by a call to free()
, or you have a memory leak. If I call free(ip_string)
inside this function the returned pointer will be useless, and if I don't then I have to rely on the code that's calling this function to release the memory, which is pretty dangerous.
As far as I can tell, the "right" solution to this is to pass an already allocated pointer to the function, such that it is the function's responsibility to fill pointed to memory. That way, calls to malloc()
and free()
can be made in the block of code. Much safer. Here's the new function:
char* ip_to_string(uint32_t ip, char *ip_string)
{
unsigned char bytes[4];
bytes[0] = ip & 0xFF;
bytes[1] = (ip >> 8) & 0xFF;
bytes[2] = (ip >> 16) & 0xFF;
bytes[3] = (ip >> 24) & 0xFF;
sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
return ip_string;
}
Answers to the Questions
What can cause a Java native function (in C) to segfault upon entry like this?
If you assign a value to a pointer that hasn't been allocated memory, you may accidentally overwrite memory on the stack. This may not cause an immediate failure, but will probably cause problems when you call other functions later.
What specific things can I look for that will help me squash this bug?
Look for a segmentation fault like any other. Things like assigning a value to unallocated memory or dereferencing a null pointer. I'm not an expert on this, but I'm willing to bet that there are many web resources for this.
How can I write code in the future that will help me avoid this problem?
Be careful with pointers, especially when you are responsible for creating them. If you see a line of code that looks like this:
type *variable;
... then look for a line that looks like ...
variable = ...;
... and make sure that this line comes before writing to the pointed to memory.
这篇关于哪些原因会导致Java本机的功能(C语言)在进入段错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!