崩溃的分析

最近修复了一些iOS项目的崩溃,想分析总结一下这些崩溃的原因,以及预防。崩溃的原因一般有下面几种:

  • 内存访问错误(这个出现的比较多,原因多种多样)

  • 非法指令的执行(超出权限范围内的指令)

  • 非法的IO访问

  • 系统调用参数出错

  • 指令条用参数错误(除以0之类)

想分析用户崩溃,收集崩溃的日志非常重要,我们项目中用的是Twitter的Crashlytics,现在叫fabric,

能够收集到比较详细的崩溃信息:各线程的崩溃栈和设备的一些信息。有一个小问题就是没有收集到各个

寄存器里面的值(看是不是我没有找到地方)。

选了出现次数最多的一个崩溃进行分析:

# OS Version: 13.1.2 (17A860)
# Device: iPhone 8
# RAM Free: 1.9%
# Disk Free: 15.7%

#24. Crashed: NSOperationQueue 0x107964a70 (QOS: UNSPECIFIED)
0  libobjc.A.dylib                0x1b394f150 objc_release + 16
1  _appstore                      0x10184b694 -[YNP_VRHomeCoreViewModel voiceRoomDidChangeSpeakingUser:] + 373 (YNP_VRHomeCoreViewModel.m:373)
2  Aipai_appstore                 0x1015a6144 __63-[YNP_VoiceRoomManager makeDelegatesPerformSelector:obj:async:]_block_invoke + 1633 (YNP_VoiceRoomManager.m:1633)
3  Foundation                     0x1b3fd161c __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 16
4  Foundation                     0x1b3edb3d8 -[NSBlockOperation main] + 100
5  Foundation                     0x1b3fd38a4 __NSOPERATION_IS_INVOKING_MAIN__ + 20
6  Foundation                     0x1b3edb070 -[NSOperation start] + 732
7  Foundation                     0x1b3fd429c __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 20
8  Foundation                     0x1b3fd3d68 __NSOQSchedule_f + 180
9  libdispatch.dylib              0x1b38bd9a8 _dispatch_block_async_invoke2 + 104
10 libdispatch.dylib              0x1b38da184 _dispatch_client_callout + 16
11 libdispatch.dylib              0x1b38b3eb8 _dispatch_continuation_pop$VARIANT$armv81 + 404
12 libdispatch.dylib              0x1b38b362c _dispatch_async_redirect_invoke + 592
13 libdispatch.dylib              0x1b38c0110 _dispatch_root_queue_drain + 344
14 libdispatch.dylib              0x1b38c08b0 _dispatch_worker_thread2 + 116
15 libsystem_pthread.dylib        0x1b3929f64 _pthread_wqthread + 212
16 libsystem_pthread.dylib        0x1b392cae0 start_wqthread + 8

崩溃的原因是EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x00000009d32f8c80,这个是属于内存访问错误,崩溃行数是373,崩溃处的代码如下:

- (void)voiceRoomDidChangeSpeakingUser:(NSArray<NSString *> *)bids
{
    @synchronized (self.seatInfos) {
        if (!self.seatInfos || ![self.seatInfos isKindOfClass:[NSArray class]]) {
            return;
        }
        [self.seatInfos enumerateObjectsUsingBlock:^(YNP_VRSeatInfoModel *obj, NSUInteger idx, BOOL * _Nonnull stop) {
            if (!bids || !bids.count || !obj.user || ![bids containsObject:obj.user.bid]) {
                obj.user.native_isTalking = NO;
            } else {
                obj.user.native_isTalking = YES;
            }
        }];
    } //>>>>>>>>>>>line 373
    [self didChangeSeatInfos];
}

乍一看比较难看出这里为什么会崩溃,为什么会调用到objc_release函数中去了,一般在OC里ARC的机制下,引用计数减1,调用这个函数。
注意到我们这个崩溃是在非主线程里面的,self.seaInfos是一个数组,查看一下上下环境,它在不同线程被改变,可能在其它线程被释放了,然后在这个地方又被释放了一次,造成内存错误崩溃。我们先暂时这么想吧,后面再验证,崩溃最好的方式是在Xcode里面重现它,调试解决,但是这个项目业务很复杂,多线程的问题比较难以重现,所以我们可以写个小demo来模拟该段代码验证一下。 demo如下:

- (void)testFun
{
    dispatch_queue_t queue1 = dispatch_queue_create("queue1", 0);
    dispatch_queue_t queue2 = dispatch_queue_create("queue2", 0);

    __block NSMutableArray* array = [NSMutableArray array];

    dispatch_async(queue1, ^{
        while (true) {
            array = [NSMutableArray array];
        }
    });
    dispatch_async(queue2, ^{
        while (true) {
            @synchronized (array) {
                [array enumerateObjectsUsingBlock:^(id  _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
                    NSLog(@"obj=%@",obj);
                }];
            }
        }
    });
}

尝试运行几次,出现了和项目类似的崩溃,截图如下:

崩溃函数的位置也是一样,先看看崩溃这段的汇编代码,结合OBJC的源码分析前面几条指令:
1.判断obj是否为空,空的话跳转到ret返回;2.测试地址最高位是否为1,执行返回跳转;3.取出对象的isa指针赋值给x8;4.得到对象的Class对象指针赋值给x8
如何获取isa指针的class对象;5.取class对象偏移32个字节的数据到w8寄存器的低32位。

libobjc.A.dylib`objc_release:
    0x1aa1f3140 <+0>:   cbz    x0, 0x1aa1f318c           ; <+76>  // 1
    0x1aa1f3144 <+4>:   tbnz   x0, #0x3f, 0x1aa1f318c    ; <+76>  // 2
    0x1aa1f3148 <+8>:   ldr    x8, [x0]   //3
    0x1aa1f314c <+12>:  and    x8, x8, #0xffffffff8    // 4
->  0x1aa1f3150 <+16>:  ldrb   w8, [x8, #0x20]    //5
    0x1aa1f3154 <+20>:  tbz    w8, #0x2, 0x1aa1f31b8     ; <+120>
    0x1aa1f3158 <+24>:  orr    x8, xzr, #0x200000000000
    0x1aa1f315c <+28>:  ldxr   x9, [x0]
    0x1aa1f3160 <+32>:  tbz    w9, #0x0, 0x1aa1f31a0     ; <+96>
    0x1aa1f3164 <+36>:  subs   x10, x9, x8
    

到这里基本可以确认是self.seatinfos在其它地方被释放了,但是在这个地方为什么会调用objc_release函数呢?看看这里的@synchronized (self.seatInfos),这里本想对这段代码加锁,但是使用self.seatInfos作为参数,明显不合适,self.seatInfos作为一个变量在其它线程会被改变,根本达不到加锁的效果。在ARC的环境下@synchronized会不会对self.seatInfos对象的引用产生变化呢。代码里面试验一下:

NSMutableArray* array = [NSMutableArray array];
NSLog(@"before count = %lu",(unsigned long)CFGetRetainCount((__bridge CFTypeRef)array));
@synchronized (array) {
      NSLog(@"in syn count = %lu", (unsigned long)CFGetRetainCount((__bridge CFTypeRef)array));
}
NSLog(@"after count = %lu", (unsigned long)CFGetRetainCount((__bridge CFTypeRef)array));

输出的结果为1,2,1。由此可见synchronized的实现对array的引用计数产生了影响。直接看一下@synchronized的汇编实现:

stub for: objc_msgSend
    0x104831548 <+84>:  mov    x29, x29
    0x10483154c <+88>:  bl     0x10483259c               ; symbol stub for: objc_retainAutoreleasedReturnValue
    0x104831550 <+92>:  stur   x0, [x29, #-0x28]
    0x104831554 <+96>:  ldur   x0, [x29, #-0x28]
    0x104831558 <+100>: bl     0x104832488               ; symbol stub for: CFGetRetainCount
    0x10483155c <+104>: mov    x1, sp
    0x104831560 <+108>: str    x0, [x1]
    0x104831564 <+112>: adrp   x0, 3
    0x104831568 <+116>: add    x0, x0, #0x360            ; =0x360
    0x10483156c <+120>: bl     0x104832494               ; symbol stub for: NSLog
->  0x104831570 <+124>: ldur   x0, [x29, #-0x28]
    0x104831574 <+128>: bl     0x104832584               ; symbol stub for: objc_retain
    0x104831578 <+132>: mov    x1, x0
    0x10483157c <+136>: mov    x30, x0
    0x104831580 <+140>: str    x1, [sp, #0x50]
    0x104831584 <+144>: str    x30, [sp, #0x48]
    0x104831588 <+148>: bl     0x1048325b4               ; symbol stub for: objc_sync_enter
    0x10483158c <+152>: ldur   x1, [x29, #-0x28]
    0x104831590 <+156>: str    w0, [sp, #0x44]
    0x104831594 <+160>: mov    x0, x1
    0x104831598 <+164>: bl     0x104832488               ; symbol stub for: CFGetRetainCount
    0x10483159c <+168>: str    x0, [sp, #0x38]
    0x1048315a0 <+172>: b      0x1048315a4               ; <+176> at ViewController.m:174:9
    0x1048315a4 <+176>: mov    x8, sp
    0x1048315a8 <+180>: ldr    x9, [sp, #0x38]
    0x1048315ac <+184>: str    x9, [x8]
    0x1048315b0 <+188>: adrp   x0, 3
    0x1048315b4 <+192>: add    x0, x0, #0x380            ; =0x380
    0x1048315b8 <+196>: bl     0x104832494               ; symbol stub for: NSLog
    0x1048315bc <+200>: b      0x1048315c0               ; <+204> at ViewController.m
    0x1048315c0 <+204>: ldr    x0, [sp, #0x48]
    0x1048315c4 <+208>: bl     0x1048325c0               ; symbol stub for: objc_sync_exit
    0x1048315c8 <+212>: ldr    x30, [sp, #0x50]
    0x1048315cc <+216>: str    w0, [sp, #0x34]
    0x1048315d0 <+220>: mov    x0, x30
    0x1048315d4 <+224>: bl     0x104832578               ; symbol stub for: objc_release
    0x1048315d8 <+228>: ldur   x0, [x29, #-0x28]
    0x1048315dc <+232>: bl     0x104832488               ; symbol stub for: CFGetRetainCount
    0x1048315e0 <+236>: mov    x30, sp
    0x1048315e4 <+240>: str    x0, [x30]
    0x1048315e8 <+244>: adrp   x0, 3
    0x1048315ec <+248>: add    x0, x0, #0x3a0            ; =0x3a0
    0x1048315f0 <+252>: bl     0x104832494               ; symbol stub for: NSLog

@synchronized实现中在调用objc_sync_enter生成递归锁之前给传入对象进行了objc_retain操作,然后在调用obj_syn_exit之后,调用objc_release释放。但是由于多线程,又没有正确加锁的原因,导致这个对象在其它线程已经被释放了,然后在这里又做了一次release,直接导致崩溃。在ARC环境下的多线程中,我们很容易忽略,那些引起引用计数发生改变的地方,没有正确加锁,这种也是偶发性的,测试环节可能被漏掉,也比较难以重现,导致项目上线,有一些用户发生崩溃,带来糟糕的体验。这里我们直接把@synchronized (self.seatInfos) 修改成@synchronized (self) ,其它地方也修改一下,即可解决这个崩溃。

参考:

[https://en.wikipedia.org/wiki/Crash_(computing)](https://en.wikipedia.org/wiki/Crash_(computing)
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/ThreadSafety.html#//apple_ref/doc/uid/10000057i-CH8-132741

01-15 16:23