对于Linux内核来说,Oops就意外着内核出了异常,此时会将产生异常时CPU的状态,出错的指令地址、数据地址及其他寄存器,函数调用的顺序甚至是栈里面的内容都打印出来,然后根据异常的严重程度来决定下一步的操作:杀死导致异常的进程或者挂起系统。

最典型的异常是在内核态引用了一个非法地址,通常是未初始化的野指针Null,这将导致页表异常,最终引发Oops。

Linux系统足够健壮,能够正常的反应各种异常。异常通常导致当前进程的死亡,而系统依然能够继续运转,但是这种运转都处在一种不稳定的状态,随时可能出问题。对于中断上下文的异常及系统关键资源的破坏,通常会导致内核挂起,不再响应任何事件。

2 内核的异常级别
2.1 Bug
Bug是指那些不符合内核的正常设计,但内核能够检测出来并且对系统运行不会产生影响的问题,比如在原子上下文中休眠。如:
BUG: scheduling while atomic: insmod/826/0x00000002
Call Trace:
[ef12f700] [c00081e0] show_stack+0x3c/0x194 (unreliable)
[ef12f730] [c0019b2c] __schedule_bug+0x64/0x78
[ef12f750] [c0350f50] schedule+0x324/0x34c
[ef12f7a0] [c03515c0] schedule_timeout+0x68/0xe4
[ef12f7e0] [c027938c] fsl_elbc_run_command+0x138/0x1c0
[ef12f820] [c0275820] nand_do_read_ops+0x130/0x3dc
[ef12f880] [c0275ebc] nand_read+0xac/0xe0
[ef12f8b0] [c0262d98] part_read+0x5c/0xe4
[ef12f8c0] [c017bcac] jffs2_flash_read+0x68/0x254
[ef12f8f0] [c0170550] jffs2_read_dnode+0x60/0x304
[ef12f940] [c017088c] jffs2_read_inode_range+0x98/0x180
[ef12f970] [c016e610] jffs2_do_readpage_nolock+0x94/0x1ac
[ef12f990] [c016ee04] jffs2_write_begin+0x2b0/0x330
[ef12fa10] [c005144c] generic_file_buffered_write+0x11c/0x8d0
[ef12fab0] [c0051e48] __generic_file_aio_write_nolock+0x248/0x500
[ef12fb20] [c0052168] generic_file_aio_write+0x68/0x10c
[ef12fb50] [c007ca80] do_sync_write+0xc4/0x138
[ef12fc10] [f107c0dc] oops_log+0xdc/0x1e8 [oopslog]
[ef12fe70] [f3087058] oops_log_init+0x58/0xa0 [oopslog]
[ef12fe80] [c00477bc] sys_init_module+0x130/0x17dc
[ef12ff40] [c00104b0] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff29658
    LR = 0x10031300

2.2 Oops
程序在内核态时,进入一种异常情况,比如引用非法指针导致的数据异常,数组越界导致的取指异常,此时异常处理机制能够捕获此异常,并将系统关键信息打印到串口上,正常情况下Oops消息会被记录到系统日志中去。

Oops发生时,进程处在内核态,很可能正在访问系统关键资源,并且获取了一些锁,当进程由于Oops异常退出时,无法释放已经获取的资源,导致其他需要获取此资源的进程挂起,对系统的正常运行造成影响。通常这种情况,系统处在不稳定的状态,很可能崩溃。

2.3 Panic
当Oops发生在中断上下文中或者在进程0和1中,系统将彻底挂起,因为中断服务程序异常后,将无法恢复,这种情况即称为内核panic。另外当系统设置了panic标志时,无论Oops发生在中断上下文还是进程上下文,都将导致内核Panic。由于在中断复位程序中panic后,系统将不再进行调度,Syslogd将不会再运行,因此这种情况下,Oops的消息仅仅打印到串口上,不会被记录在系统日志中。

在调试IC卡驱动过程中频繁拔插卡则出现BUG: scheduling while atomic: events/0/4/0x00010004异常导致系统崩溃,具体信息如下:

[ 1947.900000] BUG: scheduling while atomic: events/0/4/0x00010004
<4>[ 1947.900000] @@@@cardslot_iso7816_uart_interrupt,line:1384
<3>[ 1947.900000] BUG: scheduling while atomic: events/0/4/0x00010004
<4>[ 1947.900000] Modules linked in: iccard iso7816_uart bcm589x_pm(P) bar_scanner bcm589x_ped idtechencmag magstripe

cx930xx modem slnsp ftp101 printer bcm589x_spi touch_screen bcm5892_adc_driver matrix_keys beeper leds fusion

bcm589x_otg bcm589x_dwccom [last unloaded: iso7816_uart]
<4>[ 1947.900000]
<4>[ 1947.900000] Pid: 4, comm:             events/0
<4>[ 1947.900000] CPU: 0    Tainted: P            (2.6.32.9-bcm5892 #8)
<4>[ 1947.900000] PC is at memcpy+0x16c/0x330
<4>[ 1947.900000] LR is at 0xffffff
<4>[ 1947.900000] pc : [<c017c5cc>]    lr : [<00ffffff>]    psr: 20000013
<4>[ 1947.900000] sp : c3823ecc  ip : 00ffffff  fp : c3823f60
<4>[ 1947.900000] r10: 00000000  r9 : 00ffffff  r8 : 00ffffff
<4>[ 1947.900000] r7 : 00ffffff  r6 : 00ffffff  r5 : ff000000  r4 : 0000ffff
<4>[ 1947.900000] r3 : ff00ffff  r2 : 0000003f  r1 : c483ad44  r0 : c4854d40
<4>[ 1947.900000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
<4>[ 1947.900000] Control: 00c5387d  Table: 43b14008  DAC: 00000017
<4>[ 1947.900000] [<c0032fb4>] (show_regs+0x0/0x4c) from [<c004d85c>] (__schedule_bug+0x48/0x5c)
<4>[ 1947.900000]  r4:c3823e84
<4>[ 1947.900000] [<c004d814>] (__schedule_bug+0x0/0x5c) from [<c032b734>] (schedule+0x78/0x45c)
<4>[ 1947.900000]  r4:c3823ce8
<4>[ 1947.900000] [<c032b6bc>] (schedule+0x0/0x45c) from [<c032c138>] (schedule_timeout+0x20/0x1dc)
<4>[ 1947.900000] [<c032c118>] (schedule_timeout+0x0/0x1dc) from [<c032bfa0>] (wait_for_common+0xf8/0x1b4)
<4>[ 1947.900000]  r6:c3822000 r5:c3815340 r4:c3823ce8
<4>[ 1947.900000] [<c032bea8>] (wait_for_common+0x0/0x1b4) from [<c032c0ec>] (wait_for_completion+0x18/0x1c)
<4>[ 1947.900000] [<c032c0d4>] (wait_for_completion+0x0/0x1c) from [<c0065ba4>] (__cancel_work_timer+0x140/0x194)
<4>[ 1947.900000] [<c0065a64>] (__cancel_work_timer+0x0/0x194) from [<c0065c0c>] (cancel_delayed_work_sync+0x14/0x18)
<4>[ 1947.900000] [<c0065bf8>] (cancel_delayed_work_sync+0x0/0x18) from [<c0198044>]

(fb_deferred_io_fsync_delay+0x1c/0x2c)
<4>[ 1947.900000] [<c0198028>] (fb_deferred_io_fsync_delay+0x0/0x2c) from [<c01999dc>]

(new8110_lcd_icon_control+0xb0/0xbc)
<4>[ 1947.900000]  r5:c38daa80 r4:00008040
<4>[ 1947.900000] [<c019992c>] (new8110_lcd_icon_control+0x0/0xbc) from [<c003ad64>] (lcd_icon_set+0x20/0x28)
<4>[ 1947.900000]  r7:bf08e6b0 r6:000000ff r5:c3bee70c r4:c3b0aa1c
<4>[ 1947.900000] [<c003ad44>] (lcd_icon_set+0x0/0x28) from [<c023d8c0>] (led_trigger_event+0x64/0xa0)
<4>[ 1947.900000] [<c023d85c>] (led_trigger_event+0x0/0xa0) from [<bf088520>] (card_insert_interrupt+0x70/0x98

[iccard])
<4>[ 1947.900000]  r6:bf08cacc r5:c041cf4c r4:00000040
<4>[ 1947.900000] [<bf0884b0>] (card_insert_interrupt+0x0/0x98 [iccard]) from [<bf084b10>]

(cardslot_iso7816_uart_interrupt+0xd4/0xb68 [iccard])
<4>[ 1947.900000]  r7:0000002e r6:c041cf4c r5:0000002e r4:bf08cacc
<4>[ 1947.900000] [<bf084a3c>] (cardslot_iso7816_uart_interrupt+0x0/0xb68 [iccard]) from [<c007e76c>]

(handle_IRQ_event+0x3c/0xfc)
<4>[ 1947.900000]  r6:00000000 r5:00000000 r4:c3932da0
<4>[ 1947.900000] [<c007e730>] (handle_IRQ_event+0x0/0xfc) from [<c0080734>] (handle_level_irq+0xcc/0x168)
<4>[ 1947.900000]  r7:00000002 r6:0000002e r5:c3932da0 r4:c0420810
<4>[ 1947.900000] [<c0080668>] (handle_level_irq+0x0/0x168) from [<c0031070>] (asm_do_IRQ+0x70/0x8c)
<4>[ 1947.900000]  r6:00004000 r5:00000000 r4:0000002e
<4>[ 1947.900000] [<c0031000>] (asm_do_IRQ+0x0/0x8c) from [<c0031b78>] (__irq_svc+0x38/0xd4)
<4>[ 1947.900000] Exception stack(0xc3823e84 to 0xc3823ecc)
<4>[ 1947.900000] 3e80:          c4854d40 c483ad44 0000003f ff00ffff 0000ffff ff000000 00ffffff
<4>[ 1947.900000] 3ea0: 00ffffff 00ffffff 00ffffff 00000000 c3823f60 00ffffff c3823ecc 00ffffff
<4>[ 1947.900000] 3ec0: c017c5cc 20000013 ffffffff
<4>[ 1947.900000]  r5:d102a000 r4:ffffffff
<4>[ 1947.900000] [<c0198f84>] (new8110fb_deferred_io+0x0/0x5ac) from [<c0197f40>] (fb_deferred_io_work+0x90/0xe0)
<4>[ 1947.900000] [<c0197eb0>] (fb_deferred_io_work+0x0/0xe0) from [<c00657b8>] (worker_thread+0x178/0x22c)
<4>[ 1947.900000]  r8:c0197eb0 r7:c3801900 r6:c38daa1c r5:c3822000 r4:c38daa20
<4>[ 1947.900000] [<c0065640>] (worker_thread+0x0/0x22c) from [<c00692b8>] (kthread+0x84/0x8c)
<4>[ 1947.900000] [<c0069234>] (kthread+0x0/0x8c) from [<c0056b80>] (do_exit+0x0/0x62c)
<4>[ 1947.900000]  r7:00000000 r6:00000000 r5:00000000 r4:00000000

经过反复调试确定问题出在拔插卡中断里面,在拔插卡中断中调用了led_trigger_event(card_insert_led_trigger, LED_OFF)函数,此函数将会加锁而概率性引起死锁或者阻塞,而在中断上下文中绝对不允许调用阻塞函数,故系统将会报bug.

去掉后问题解决.

04-30 12:19