崩溃的分析
最近修复了一些iOS项目的崩溃,想分析总结一下这些崩溃的原因,以及预防。崩溃的原因一般有下面几种:
内存访问错误(这个出现的比较多,原因多种多样)
非法指令的执行(超出权限范围内的指令)
非法的IO访问
系统调用参数出错
指令条用参数错误(除以0之类)
想分析用户崩溃,收集崩溃的日志非常重要,我们项目中用的是Twitter的Crashlytics,现在叫fabric,
能够收集到比较详细的崩溃信息:各线程的崩溃栈和设备的一些信息。有一个小问题就是没有收集到各个
寄存器里面的值(看是不是我没有找到地方)。
选了出现次数最多的一个崩溃进行分析:
# OS Version: 13.1.2 (17A860)
# Device: iPhone 8
# RAM Free: 1.9%
# Disk Free: 15.7%
#24. Crashed: NSOperationQueue 0x107964a70 (QOS: UNSPECIFIED)
0 libobjc.A.dylib 0x1b394f150 objc_release + 16
1 _appstore 0x10184b694 -[YNP_VRHomeCoreViewModel voiceRoomDidChangeSpeakingUser:] + 373 (YNP_VRHomeCoreViewModel.m:373)
2 Aipai_appstore 0x1015a6144 __63-[YNP_VoiceRoomManager makeDelegatesPerformSelector:obj:async:]_block_invoke + 1633 (YNP_VoiceRoomManager.m:1633)
3 Foundation 0x1b3fd161c __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 16
4 Foundation 0x1b3edb3d8 -[NSBlockOperation main] + 100
5 Foundation 0x1b3fd38a4 __NSOPERATION_IS_INVOKING_MAIN__ + 20
6 Foundation 0x1b3edb070 -[NSOperation start] + 732
7 Foundation 0x1b3fd429c __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 20
8 Foundation 0x1b3fd3d68 __NSOQSchedule_f + 180
9 libdispatch.dylib 0x1b38bd9a8 _dispatch_block_async_invoke2 + 104
10 libdispatch.dylib 0x1b38da184 _dispatch_client_callout + 16
11 libdispatch.dylib 0x1b38b3eb8 _dispatch_continuation_pop$VARIANT$armv81 + 404
12 libdispatch.dylib 0x1b38b362c _dispatch_async_redirect_invoke + 592
13 libdispatch.dylib 0x1b38c0110 _dispatch_root_queue_drain + 344
14 libdispatch.dylib 0x1b38c08b0 _dispatch_worker_thread2 + 116
15 libsystem_pthread.dylib 0x1b3929f64 _pthread_wqthread + 212
16 libsystem_pthread.dylib 0x1b392cae0 start_wqthread + 8
崩溃的原因是EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x00000009d32f8c80,这个是属于内存访问错误,崩溃行数是373,崩溃处的代码如下:
- (void)voiceRoomDidChangeSpeakingUser:(NSArray<NSString *> *)bids
{
@synchronized (self.seatInfos) {
if (!self.seatInfos || ![self.seatInfos isKindOfClass:[NSArray class]]) {
return;
}
[self.seatInfos enumerateObjectsUsingBlock:^(YNP_VRSeatInfoModel *obj, NSUInteger idx, BOOL * _Nonnull stop) {
if (!bids || !bids.count || !obj.user || ![bids containsObject:obj.user.bid]) {
obj.user.native_isTalking = NO;
} else {
obj.user.native_isTalking = YES;
}
}];
} //>>>>>>>>>>>line 373
[self didChangeSeatInfos];
}
乍一看比较难看出这里为什么会崩溃,为什么会调用到objc_release函数中去了,一般在OC里ARC的机制下,引用计数减1,调用这个函数。
注意到我们这个崩溃是在非主线程里面的,self.seaInfos是一个数组,查看一下上下环境,它在不同线程被改变,可能在其它线程被释放了,然后在这个地方又被释放了一次,造成内存错误崩溃。我们先暂时这么想吧,后面再验证,崩溃最好的方式是在Xcode里面重现它,调试解决,但是这个项目业务很复杂,多线程的问题比较难以重现,所以我们可以写个小demo来模拟该段代码验证一下。 demo如下:
- (void)testFun
{
dispatch_queue_t queue1 = dispatch_queue_create("queue1", 0);
dispatch_queue_t queue2 = dispatch_queue_create("queue2", 0);
__block NSMutableArray* array = [NSMutableArray array];
dispatch_async(queue1, ^{
while (true) {
array = [NSMutableArray array];
}
});
dispatch_async(queue2, ^{
while (true) {
@synchronized (array) {
[array enumerateObjectsUsingBlock:^(id _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
NSLog(@"obj=%@",obj);
}];
}
}
});
}
尝试运行几次,出现了和项目类似的崩溃,截图如下:
崩溃函数的位置也是一样,先看看崩溃这段的汇编代码,结合OBJC的源码分析前面几条指令:
1.判断obj是否为空,空的话跳转到ret返回;2.测试地址最高位是否为1,执行返回跳转;3.取出对象的isa指针赋值给x8;4.得到对象的Class对象指针赋值给x8
如何获取isa指针的class对象;5.取class对象偏移32个字节的数据到w8寄存器的低32位。
libobjc.A.dylib`objc_release:
0x1aa1f3140 <+0>: cbz x0, 0x1aa1f318c ; <+76> // 1
0x1aa1f3144 <+4>: tbnz x0, #0x3f, 0x1aa1f318c ; <+76> // 2
0x1aa1f3148 <+8>: ldr x8, [x0] //3
0x1aa1f314c <+12>: and x8, x8, #0xffffffff8 // 4
-> 0x1aa1f3150 <+16>: ldrb w8, [x8, #0x20] //5
0x1aa1f3154 <+20>: tbz w8, #0x2, 0x1aa1f31b8 ; <+120>
0x1aa1f3158 <+24>: orr x8, xzr, #0x200000000000
0x1aa1f315c <+28>: ldxr x9, [x0]
0x1aa1f3160 <+32>: tbz w9, #0x0, 0x1aa1f31a0 ; <+96>
0x1aa1f3164 <+36>: subs x10, x9, x8
到这里基本可以确认是self.seatinfos在其它地方被释放了,但是在这个地方为什么会调用objc_release函数呢?看看这里的@synchronized (self.seatInfos),这里本想对这段代码加锁,但是使用self.seatInfos作为参数,明显不合适,self.seatInfos作为一个变量在其它线程会被改变,根本达不到加锁的效果。在ARC的环境下@synchronized会不会对self.seatInfos对象的引用产生变化呢。代码里面试验一下:
NSMutableArray* array = [NSMutableArray array];
NSLog(@"before count = %lu",(unsigned long)CFGetRetainCount((__bridge CFTypeRef)array));
@synchronized (array) {
NSLog(@"in syn count = %lu", (unsigned long)CFGetRetainCount((__bridge CFTypeRef)array));
}
NSLog(@"after count = %lu", (unsigned long)CFGetRetainCount((__bridge CFTypeRef)array));
输出的结果为1,2,1。由此可见synchronized的实现对array的引用计数产生了影响。直接看一下@synchronized的汇编实现:
stub for: objc_msgSend
0x104831548 <+84>: mov x29, x29
0x10483154c <+88>: bl 0x10483259c ; symbol stub for: objc_retainAutoreleasedReturnValue
0x104831550 <+92>: stur x0, [x29, #-0x28]
0x104831554 <+96>: ldur x0, [x29, #-0x28]
0x104831558 <+100>: bl 0x104832488 ; symbol stub for: CFGetRetainCount
0x10483155c <+104>: mov x1, sp
0x104831560 <+108>: str x0, [x1]
0x104831564 <+112>: adrp x0, 3
0x104831568 <+116>: add x0, x0, #0x360 ; =0x360
0x10483156c <+120>: bl 0x104832494 ; symbol stub for: NSLog
-> 0x104831570 <+124>: ldur x0, [x29, #-0x28]
0x104831574 <+128>: bl 0x104832584 ; symbol stub for: objc_retain
0x104831578 <+132>: mov x1, x0
0x10483157c <+136>: mov x30, x0
0x104831580 <+140>: str x1, [sp, #0x50]
0x104831584 <+144>: str x30, [sp, #0x48]
0x104831588 <+148>: bl 0x1048325b4 ; symbol stub for: objc_sync_enter
0x10483158c <+152>: ldur x1, [x29, #-0x28]
0x104831590 <+156>: str w0, [sp, #0x44]
0x104831594 <+160>: mov x0, x1
0x104831598 <+164>: bl 0x104832488 ; symbol stub for: CFGetRetainCount
0x10483159c <+168>: str x0, [sp, #0x38]
0x1048315a0 <+172>: b 0x1048315a4 ; <+176> at ViewController.m:174:9
0x1048315a4 <+176>: mov x8, sp
0x1048315a8 <+180>: ldr x9, [sp, #0x38]
0x1048315ac <+184>: str x9, [x8]
0x1048315b0 <+188>: adrp x0, 3
0x1048315b4 <+192>: add x0, x0, #0x380 ; =0x380
0x1048315b8 <+196>: bl 0x104832494 ; symbol stub for: NSLog
0x1048315bc <+200>: b 0x1048315c0 ; <+204> at ViewController.m
0x1048315c0 <+204>: ldr x0, [sp, #0x48]
0x1048315c4 <+208>: bl 0x1048325c0 ; symbol stub for: objc_sync_exit
0x1048315c8 <+212>: ldr x30, [sp, #0x50]
0x1048315cc <+216>: str w0, [sp, #0x34]
0x1048315d0 <+220>: mov x0, x30
0x1048315d4 <+224>: bl 0x104832578 ; symbol stub for: objc_release
0x1048315d8 <+228>: ldur x0, [x29, #-0x28]
0x1048315dc <+232>: bl 0x104832488 ; symbol stub for: CFGetRetainCount
0x1048315e0 <+236>: mov x30, sp
0x1048315e4 <+240>: str x0, [x30]
0x1048315e8 <+244>: adrp x0, 3
0x1048315ec <+248>: add x0, x0, #0x3a0 ; =0x3a0
0x1048315f0 <+252>: bl 0x104832494 ; symbol stub for: NSLog
@synchronized实现中在调用objc_sync_enter生成递归锁之前给传入对象进行了objc_retain操作,然后在调用obj_syn_exit之后,调用objc_release释放。但是由于多线程,又没有正确加锁的原因,导致这个对象在其它线程已经被释放了,然后在这里又做了一次release,直接导致崩溃。在ARC环境下的多线程中,我们很容易忽略,那些引起引用计数发生改变的地方,没有正确加锁,这种也是偶发性的,测试环节可能被漏掉,也比较难以重现,导致项目上线,有一些用户发生崩溃,带来糟糕的体验。这里我们直接把@synchronized (self.seatInfos) 修改成@synchronized (self) ,其它地方也修改一下,即可解决这个崩溃。
参考:
[https://en.wikipedia.org/wiki/Crash_(computing)](https://en.wikipedia.org/wiki/Crash_(computing)
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/ThreadSafety.html#//apple_ref/doc/uid/10000057i-CH8-132741