分类: 移动开发 android framework 2012-12-28 12:00 983人阅读 评论(0) 举报

目录(?)[+]

Android Debuggerd的分析及使用方法

Android系统自带一个实用的程序异常退出的诊断daemondebuggerd。此进程可以侦测到程序崩溃,并将崩溃时的进程状态信息输出到文件和串口中,以供开发人员分析调试使用。

Debuggerd的数据,被保存在/data/tombstone/目录下(名字取的也很形象,tombstone是墓碑的意思),共可保存10个文件,当超过10个时,会覆盖重写最早生产的文件。串口中,则直接用DEBUG的tag,输出logcat信息。

信息详解

Debuggerd的输出格式大约如下:

I/DEBUG ( 9114): *** *** *** *** *** *** *** *** *** *** *** *** *** ****** ***

I/DEBUG ( 9114): Build fingerprint:'generic/gs701b/gs701b:4.0.3/IML74K/eng.andy.xia.20120827.120650:user/test-keys'

I/DEBUG ( 9114): pid: 11053, tid: 11065 >>> net.osaris.turbofly<<<

I/DEBUG ( 9114): signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr42771108

I/DEBUG ( 9114): zr 00000000 at 00000000 v0 5596dbd8 v1 41c65990

I/DEBUG ( 9114): a0 42771014 a1 5596dbd8 a2 42771014 a3 5596dbd8

I/DEBUG ( 9114): t0 5596dbd8 t1 53f6b000 t2 00000001 t3 000000a8

I/DEBUG ( 9114): t4 5596dc58 t5 00000080 t6 0000001c t7 000000e4

I/DEBUG ( 9114): s0 540d96f0 s1 41c65990 s2 00000015 s3 00000015

I/DEBUG ( 9114): s4 2c143d40 s5 4cf8bdf4 s6 2b8d0178 s7 2c13cdd8

I/DEBUG ( 9114): t8 0000000f t9 53f70748 k0 000000d8 k1 00000000

I/DEBUG ( 9114): gp 53f93d50 sp 4f7feaf8 s8 4f7feb68 ra 53f7334c

I/DEBUG ( 9114): hi 00000000 lo 01910000 bva 42771108 epc 53f73374

I/DEBUG ( 9114): #00 pc 53f73374 sp 4f7feaf8 /system/lib/egl/libGLESv1_CM_VIVANTE.so

I/DEBUG ( 9114): #01 pc 53f73570 sp 4f7feb28 /system/lib/egl/libGLESv1_CM_VIVANTE.so:glBindTexture+468

I/DEBUG ( 9114): #02 pc 2b76c16c sp 4f7feb58 /system/lib/libdvm.so:dvmPlatformInvoke+220

I/DEBUG ( 9114): #03 pc 41a59c00 sp 4f7feb70 /dev/ashmem/dalvik-LinearAlloc (deleted)

I/DEBUG ( 9114):

I/DEBUG ( 9114): code around pc:

I/DEBUG ( 9114): 53f73354 8fa70018 acf100f4 8e2600f8 8fa50018 ..........&.....

I/DEBUG ( 9114): 53f73364 aca600f8 8fa20018 8c4300f4 8c4400f8 ..........C...D.

I/DEBUG ( 9114): 53f73374 ac8200f4 ac6200f8 8fa30018 8fbf002c ......b.....,...

I/DEBUG ( 9114): 53f73384 00601021 8fb20028 8fb10024 8fb00020 !.`.(...$......

I/DEBUG ( 9114): 53f73394 03e00008 27bd0030 3c1c0002 279c09b4 ....0..'...<...'

I/DEBUG ( 9114):

I/DEBUG ( 9114): code around ra:

I/DEBUG ( 9114): 53f7332c 8fbc0010 04400013 00001821 8f8981b4 ......@.!.......

I/DEBUG ( 9114): 53f7333c 8fa50018 25396d30 0320f809 02002021 ....0m9%...! ..

I/DEBUG ( 9114): 53f7334c 8fa80018 ad120000 8fa70018 acf100f4 ................

I/DEBUG ( 9114): 53f7335c 8e2600f8 8fa50018 aca600f8 8fa20018 ..&.............

I/DEBUG ( 9114): 53f7336c 8c4300f4 8c4400f8 ac8200f4 ac6200f8 ..C...D.......b.

I/DEBUG ( 9114):

I/DEBUG ( 9114): memory map around addr 42771108:

I/DEBUG ( 9114): 4265e000-4266d000 /system/framework/ext.jar

I/DEBUG ( 9114): 4266d000-427da000 /system/framework/ext.odex

I/DEBUG ( 9114): 427da000-431d7000 /system/framework/framework.odex

I/DEBUG ( 9114):

I/DEBUG ( 9114): stack:

I/DEBUG ( 9114): 4f7feab8 00000002

I/DEBUG ( 9114): 4f7feabc 002a3bc0 [heap]

I/DEBUG ( 9114): 4f7feac0 4f7fe0a8

I/DEBUG ( 9114): 4f7feac4 002a52a0 [heap]

I/DEBUG ( 9114): 4f7feac8 00009004

I/DEBUG ( 9114): 4f7feacc 00000000

I/DEBUG ( 9114): 4f7fead0 00000000

I/DEBUG ( 9114): 4f7fead4 00000000

I/DEBUG ( 9114): 4f7fead8 540d96f0

I/DEBUG ( 9114): 4f7feadc 41c65990 /dev/ashmem/dalvik-LinearAlloc(deleted)

I/DEBUG ( 9114): 4f7feae0 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so

I/DEBUG ( 9114): 4f7feae4 00000015

I/DEBUG ( 9114): 4f7feae8 2c143d40 /dev/ashmem/dalvik-heap (deleted)

I/DEBUG ( 9114): 4f7feaec 540d96f0

I/DEBUG ( 9114): 4f7feaf0 41c65990 /dev/ashmem/dalvik-LinearAlloc(deleted)

I/DEBUG ( 9114): 4f7feaf4 53f7334c /system/lib/egl/libGLESv1_CM_VIVANTE.so

I/DEBUG ( 9114): #00 4f7feaf8 00000000

I/DEBUG ( 9114): 4f7feafc 0026aed0 [heap]

I/DEBUG ( 9114): 4f7feb00 00000de1

I/DEBUG ( 9114): 4f7feb04 4fedbc78 /system/lib/egl/libEGL_VIVANTE.so:veglGetCurrentAPIContext+36

I/DEBUG ( 9114): 4f7feb08 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so

I/DEBUG ( 9114): 4f7feb0c 4fedbc78 /system/lib/egl/libEGL_VIVANTE.so:veglGetCurrentAPIContext+36

I/DEBUG ( 9114): 4f7feb10 5596dbd8

I/DEBUG ( 9114): 4f7feb14 53f5467c /system/lib/egl/libGLESv1_CM_VIVANTE.so:glBindBuffer+56

I/DEBUG ( 9114): 4f7feb18 00000de1

I/DEBUG ( 9114): 4f7feb1c 00000000

I/DEBUG ( 9114): 4f7feb20 540d8f18

I/DEBUG ( 9114): 4f7feb24 53f73570 /system/lib/egl/libGLESv1_CM_VIVANTE.so:glBindTexture+468

I/DEBUG ( 9114): #01 4f7feb28 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so

I/DEBUG ( 9114): 4f7feb2c 0026ef70 [heap]

I/DEBUG ( 9114): 4f7feb30 00000001

I/DEBUG ( 9114): 4f7feb34 00000000

I/DEBUG ( 9114): 4f7feb38 53f93d50 /system/lib/egl/libGLESv2_VIVANTE.so

I/DEBUG ( 9114): 4f7feb3c 2c143d40 /dev/ashmem/dalvik-heap (deleted)

I/DEBUG ( 9114): 4f7feb40 4cf8be2c

I/DEBUG ( 9114): 4f7feb44 0026ef70 [heap]

I/DEBUG ( 9114): 4f7feb48 00000001

I/DEBUG ( 9114): 4f7feb4c 00000000

I/DEBUG ( 9114): 4f7feb50 0026ef60 [heap]

I/DEBUG ( 9114): 4f7feb54 2b76c16c /system/lib/libdvm.so:dvmPlatformInvoke+220

I/DEBUG ( 9114): #02 4f7feb58 02320000

I/DEBUG ( 9114): 4f7feb5c 01910000

I/DEBUG ( 9114): 4f7feb60 00000de1

I/DEBUG ( 9114): 4f7feb64 00000015

I/DEBUG ( 9114): 4f7feb68 2b8d6530 /system/lib/libGLESv1_CM.so

I/DEBUG ( 9114): 4f7feb6c 41a59c00 /dev/ashmem/dalvik-LinearAlloc(deleted)

I/DEBUG ( 9114): 4f7feb70 41a59c00 /dev/ashmem/dalvik-LinearAlloc(deleted)

I/DEBUG ( 9114): 4f7feb74 00000001

I/DEBUG ( 9114): 4f7feb78 00000014

I/DEBUG ( 9114): 4f7feb7c 2b7d5328 /system/lib/libdvm.so

I/DEBUG ( 9114): 4f7feb80 2b8d6530 /system/lib/libGLESv1_CM.so

I/DEBUG ( 9114): 4f7feb84 2ab2126c /system/lib/libc.so

I/DEBUG ( 9114): 4f7feb88 00000002

I/DEBUG ( 9114): 4f7feb8c 00000033

I/DEBUG ( 9114): 4f7feb90 4cf8bdf4

I/DEBUG ( 9114): 4f7feb94 42e38a35 /system/framework/framework.odex

I/DEBUG ( 9114): 4f7feb98 2ac78884 /system/lib/libandroid_runtime.so

I/DEBUG ( 9114): 4f7feb9c 0026ef70 [heap]

从这些数据中,我们可以看到如下信息:

编译版本:

出错的进程和线程:

错误原因:

寄存器信息

调用堆栈

关键位置的memorydump

栈帧信息

debuggered实现细节

Linuxkernel有自己的一套signal机制,在应用程序崩溃时,通常系统内核都会发送signal到出问题的进程,以通知进程出现什么异常,这些进程可以捕获这些signal并对其做相应的处理。通常对于程序异常信号的处理,就是退出。

Android在此机制上,实现了一个更实用的功能:拦截这些信号,dump进程信息以供调试。

异常捕获

在一个新进程启动时,android的实现是在其中插入debugger_init
方法,以实现拦截系统异常的几个singal:SIGILL,SIGABRT, SIGBUS, SIGFPE, SIGSEGV和SIGPIPE,代码位于:bionic/linker/debugger.c

voiddebugger_init()

{

struct sigaction act;

memset(&act, 0, sizeof(act));

act.sa_sigaction = debugger_signal_handler;

act.sa_flags = SA_RESTART | SA_SIGINFO;

sigemptyset(&act.sa_mask);

sigaction(SIGILL, &act, NULL);

sigaction(SIGABRT, &act, NULL);

sigaction(SIGBUS, &act, NULL);

sigaction(SIGFPE, &act, NULL);

sigaction(SIGSEGV, &act, NULL);

#ifdefined(SIGSTKFLT)

sigaction(SIGSTKFLT, &act, NULL);

#endif

sigaction(SIGPIPE, &act, NULL);

}

Debugger_init的调用时机,是在应用程序入口地址__start后,__linker_init中调用的。这部分属于bionic实现的一部分,则对所有android的程序有效(android和传统的linux下基于glibc的不同,glibc的interpreter是/lib/ld-linux-xx.so.2,android的interpreter是/system/bin/linker)。

对于捕获的异常,异常处理函数:

/*

*Catches fatal signals so we can ask debuggerd to ptrace us before wecrash.

*/

voiddebugger_signal_handler(int n, siginfo_t* info, void* unused__attribute__((unused)))

{

char msgbuf[128];

unsigned tid;

int s;

/*

* It's possible somebody cleared the SA_SIGINFO flag, which wouldmean

* our "info" arg holds an undefined value.

*/

if (!haveSiginfo(n)) {

info = NULL;

}

logSignalSummary(n, info);

tid = gettid();

s = socket_abstract_client(DEBUGGER_SOCKET_NAME, SOCK_STREAM);

//#defineDEBUGGER_SOCKET_NAME "android:debuggerd"

if (s >= 0) {

/* debugger knows our pid from the credentials on the

* local socket but we need to tell it our tid. It

* is paranoid and will verify that we are giving a tid

* that's actually in our process

*/

int ret;

debugger_msg_t msg;

msg.action = DEBUGGER_ACTION_CRASH;

msg.tid = tid;

RETRY_ON_EINTR(ret, write(s, &msg, sizeof(msg)));

if (ret == sizeof(msg)) {

/* if the write failed, there is no point to read on

* the file descriptor. */

RETRY_ON_EINTR(ret, read(s, &tid, 1));

int savedErrno = errno;

notify_gdb_of_libraries();

errno = savedErrno;

}

if(ret < 0) {

/* read or write failed -- broken connection? */

format_buffer(msgbuf, sizeof(msgbuf),

"Failed while talking to debuggerd: %s",strerror(errno));

__libc_android_log_write(ANDROID_LOG_FATAL, "libc",msgbuf);

}

close(s);

} else {

/* socket failed; maybe process ran out of fds */

format_buffer(msgbuf, sizeof(msgbuf),

"Unable to open connection to debuggerd: %s",strerror(errno));

__libc_android_log_write(ANDROID_LOG_FATAL, "libc",msgbuf);

}

/* remove our net so we fault for real when we return */

signal(n, SIG_DFL);

/*

* These signals are not re-thrown when we resume. This means that

* crashing due to (say) SIGPIPE doesn't work the way you'd expectit

* to. We work around this by throwing them manually. We don'twant

* to do this for *all* signals because it'll screw up the addressfor

* faults like SIGSEGV.

*/

switch (n) {

case SIGABRT:

case SIGFPE:

case SIGPIPE:

#ifdefSIGSTKFLT

case SIGSTKFLT:

#endif

(void) tgkill(getpid(), gettid(), n);

break;

default: // SIGILL, SIGBUS, SIGSEGV

break;

}

}

从代码可见,这是socket的客户端,通过向名为android:debuggerd的socket,发送一个消息,参数是tid:也就是出错的线程ID。

这里,进程挂起,等待socket的服务端:也就是debuggerd,处理这个事件。

debuggerd 处理异常请求

debuggerd这个daemon,是具体处理进程退出时,tombstone生成的服务,代码位于:system/core/debuggerd/debuggerd.c,看其main函数,这里即是android:debuggerd的服务端:

s =socket_local_server(DEBUGGER_SOCKET_NAME,

ANDROID_SOCKET_NAMESPACE_ABSTRACT, SOCK_STREAM);

if(s < 0) return1;

fcntl(s, F_SETFD,FD_CLOEXEC);

LOG("debuggerd:" __DATE__ " " __TIME__ "\n");

for(;;) {

struct sockaddraddr;

socklen_talen;

int fd;

alen =sizeof(addr);

XLOG("waitingfor connection\n");

fd = accept(s,&addr, &alen);

if(fd < 0){

XLOG("accept failed: %s\n", strerror(errno));

continue;

}

fcntl(fd,F_SETFD, FD_CLOEXEC);

handle_request(fd);

}

return 0;

}

当一个进程由于发生异常时,通过前一部分的介绍的debugger_signal_handler,会通过socket向debuggerd进程发送消息,这里,socket将accept到消息,通过handle_request(fd);来处理这个异常。在handle_request中,首先通过read_request(fd,&request),获取到socket通信的另外一端的信息:pid,uid和gid。然后从socket中,读到debugger_signal_handler送过来的tid,自此debuggerd即可知道需要被调试进程的信息了。

for (;;) {

intsignal = wait_for_signal(request.tid, &total_sleep_time_usec);

if(signal < 0) {

break;

}

switch (signal) {

case SIGSTOP:

if (request.action == DEBUGGER_ACTION_DUMP_TOMBSTONE) {

XLOG("stopped -- dumping to tombstone\n");

tombstone_path = engrave_tombstone(request.pid, request.tid,

signal, true, true, &detach_failed,

&total_sleep_time_usec);

} else if (request.action == DEBUGGER_ACTION_DUMP_BACKTRACE) {

XLOG("stopped -- dumping to fd\n");

dump_backtrace(fd, request.pid, request.tid, &detach_failed,

&total_sleep_time_usec);

} else {

XLOG("stopped -- continuing\n");

status = ptrace(PTRACE_CONT, request.tid, 0, 0);

if (status) {

LOG("ptrace continue failed: %s\n",strerror(errno));

}

continue; /* loop again */

}

break;

case SIGILL:

case SIGABRT:

case SIGBUS:

case SIGFPE:

case SIGSEGV:

case SIGPIPE:

#ifdef SIGSTKFLT

case SIGSTKFLT:

#endif

{

XLOG("stopped -- fatal signal\n");

/*

* Send a SIGSTOP to the process to make all of

* the non-signaled threads stop moving. Without

* this we get a lot of "ptrace detach failed:

* No such process".

*/

kill(request.pid, SIGSTOP);

/* don't dump sibling threads when attaching to GDB because it

* makes the process less reliable, apparently... */

tombstone_path = engrave_tombstone(request.pid, request.tid,

signal, !attach_gdb, false, &detach_failed,

&total_sleep_time_usec);

break;

}

default:

XLOG("stopped -- unexpected signal\n");

LOG("process stopped due to unexpected signal %d\n",signal);

break;

}

break;

}

读取客户端送过的tid,tid是标明那个线程ID执行中遇到错误了,debuggerd就专门针对该线程dump出其寄存器、backtrace和栈信息以供调试。ptrace(PTRACE_ATTACH,request.tid,
0,0)这里,debuggerd就挂上ptrace了,attach到出问题的线程,这样debuggerd就可以控制tid线程了。ptrace的实现,attach上之后,debuggerd进程就是被调试进程的父进程了,PTRACE_ATTACH会向被调试进程发送SIGSTOP。由于之前,在目标进程的signal处理函数中,是堵在socket的read中(这样做是等待被debuggerd响应到),TEMP_FAILURE_RETRY(write(fd,"\0",
1)) != 1)这里写一下,则read可以读到数据,等待结束,之后如果使用ptrace(PTRACE_CONT,request.tid, 0, 0)的话,被调线程可以继续执行。

signal= wait_for_signal(request.tid,&total_sleep_time_usec);这里查看wait的被调试进程的signal状态。

switch(signal) {

case SIGSTOP:

if (request.action ==DEBUGGER_ACTION_DUMP_TOMBSTONE) {

XLOG("stopped -- dumping totombstone\n");

tombstone_path =engrave_tombstone(request.pid, request.tid,

signal, true, true, &detach_failed,

&total_sleep_time_usec);

} else if (request.action ==DEBUGGER_ACTION_DUMP_BACKTRACE) {

XLOG("stopped -- dumping to fd\n");

dump_backtrace(fd, request.pid, request.tid,&detach_failed,

&total_sleep_time_usec);

} else {

XLOG("stopped -- continuing\n");

status = ptrace(PTRACE_CONT, request.tid, 0,0);

if (status) {

LOG("ptrace continue failed: %s\n",strerror(errno));

}

continue; /* loop again */

}

break;

case SIGILL:

case SIGABRT:

case SIGBUS:

case SIGFPE:

case SIGSEGV:

case SIGPIPE:

#ifdefSIGSTKFLT

case SIGSTKFLT:

#endif

{

XLOG("stopped -- fatal signal\n");

/*

* Send a SIGSTOP to the process to make all of

* the non-signaled threads stop moving. Without

* this we get a lot of "ptrace detachfailed:

* No such process".

*/

kill(request.pid, SIGSTOP);

/* don't dump sibling threads when attaching toGDB because it

* makes the process less reliable, apparently...*/

tombstone_path = engrave_tombstone(request.pid,request.tid,

signal, !attach_gdb, false,&detach_failed,

&total_sleep_time_usec);

break;

}

default:

XLOG("stopped -- unexpected signal\n");

LOG("process stopped due to unexpectedsignal %d\n", signal);

break;

}

break;

}

这块是debuggerd最核心的部分:生产tombstone的调试信息。

Tombstone的生成过程

整个debuggerd的工作流程如下图:

  1. Mips的栈帧结构简介

  2. 问题分析及定位方法

  3. 有用的信息

Debuggerd除了会在进程异常时产生tombstone外,还可以协助我们debug这个进程。使用方法是:

在串口中,设置需要debug的应用程序的uid后,如果这个程序出现异常,即可挂上gdb调试。

Androiduid的规则,所有zygote启动的app,都是从10000开始,比如ps时,看到一个app叫app_23,则可以设置:setpropdebug.db.uid 10023,即可debug此进程。

对于一些库,可能没有符号信息,这样在tombsotne中打印的trace,很难查看具体出错的函数,可以通过在该库模块的编译选项中,注释掉,重编编译,即可得到带符号信息的库。

05-17 20:43