跟 coredump 文件一样, ramdump 文件也是内存转储文件, 但是 ramdump 文件更大, 因为它几乎是对整个物理内存的镜像, 除了一些 security 类型的 memory 抓不出来之外, 几乎所有的 dram 都被抓下来, 有些问题的复现概率低, 而且有些问题是由于踩内存导致的, 这种问题靠 log 往往是无法分析出来的, 所以如果可以在问题发生时候把内存镜像保存下来, 对于分析问题是很有帮助的.
MTK 平台抓 ramdump 的方法
因为每个平台抓取 ramdump 的方式都不一样, 所以这里以 MTK 平台作为示例, 在 MTK 平台上面, user/user debug build 抓取 ramdump 的方法如下所示,
另外由于 MTK 平台和 Android,Kernel 的升级, 下面的这个方式也不一定会有用, 如果抓取不了可以咨询芯片厂家.
第 1 步: 修改 LK 的 make file "bootable/bootloader/lk/app/mt_boot/rules.mk"
- build_mt_ramdump := yes ## 原先是 no, 这里修改为 yes
- ifneq ($(TARGET_BUILD_VARIANT),user)
- ifeq ($(ARCH_HAVE_MT_RAMDUMP),yes)
- build_mt_ramdump := yes
- endif
- endif
第 2 步: 修改 user-build kernel config(kernel/arch/arm/configs/xxx_defconfig 或
kernel/arch/arm64/configs/xxx_defconfig) 把 (没有对应的 config 则不需要修改)
- # CONFIG_MTK_AEE_MRDUMP is not set
- # CONFIG_HAVE_DDR_RESERVE_MODE is not set
修改为 enable
- CONFIG_MTK_AEE_MRDUMP=y
- CONFIG_HAVE_DDR_RESERVE_MODE=y
第 3 步: 编译下载后复现时请打开 mtklogger, 如果没有打开也不会抓 ramdump.
crash 工具介绍
同 GDB 工具一样, crash 也是一个很有用的分析工具, 它的官网 crash http://people.redhat.com/anderson/ , 当前最新的版本为 7.2.3, 最新的版本支持 ARM64 平台, 将代码下载下来之后解压, 编译支持 arm64 的版本
taotaomami@taotaomami$ ls
alpha.c dev.c gdb_interface.c lkcd_dump_v7.h lkcd_x86_trace.c netdump.h s390dbf.c unwind.c vmware_vmss.c xen_hyper_dump_tables.c
arm64.c diskdump.c global_data.c lkcd_dump_v8.h lkcd_x86_trace.h ppc64.c s390_dump.c unwind_decoder.c vmware_vmss.h xen_hyper_global_data.c
arm.c diskdump.h help.c lkcd_fix_mem.c main.c ppc.c s390x.c unwind.h x86_64.c
binqQtgK4pQ66.bin extensions ia64.c lkcd_fix_mem.h makedumpfile.c qemu.c sadump.c unwind_i.h x86.c
build_data.c extensions.c ibm_common.h lkcd_v1.c makedumpfile.h qemu-load.c sadump.h unwind_x86_32_64.c xen_dom0.c
cmdline.c filesys.c ipcs.c lkcd_v2_v3.c Makefile qemu-load.h sparc64.c unwind_x86_64.h xen_dom0.h
configure gdb-7.6 kernel.c lkcd_v5.c memory.c ramdump.c symbols.c unwind_x86.h xendump.c
configure.c gdb-7.6.patch kvmdump.c lkcd_v7.c memory_driver README task.c vas_crash.h xendump.h
COPYING3 gdb-7.6-ppc64le-support.patch kvmdump.h lkcd_v8.c mips.c remote.c test.c va_server.c xen_hyper.c
crash.8 gdb-7.6.tar.gz lkcd_common.c lkcd_vmdump_v1.h net.c rse.h tools.c va_server.h xen_hyper_command.c
defs.h gdb.files lkcd_dump_v5.h lkcd_vmdump_v2_v3.h netdump.c s390.c unwind_arm.c va_server_v1.c xen_hyper_defs.h
- taotaomami@taotaomami$ make target=arm64
- TARGET: ARM64
- CRASH: 7.1.7++
- GDB: 7.6
gcc -g -O2 -I. -I. -I./common -I./config -DLOCALEDIR="\"/usr/local/share/locale\"" -DCRASH_MERGE -DHAVE_CONFIG_H -I./../include/opcode -I./../opcodes/.. -I./../readline/.. -I../bfd -I./../bfd -I./../include -I../libdecnumber -I./../libdecnumber -I./gnulib/import -Ibuild-gnulib/import -DTUI=1 -Wall -Wdeclaration-after-statement -Wpointer-arith -Wformat-nonliteral -Wno-pointer-sign -Wno-unused -Wunused-value -Wunused-function -Wno-switch -Wno-char-subscripts -Wmissing-prototypes -Wdeclaration-after-statement -Wempty-body -c -o symtab.o -MT symtab.o -MMD -MP -MF .deps/symtab.Tpo symtab.c
- Making init.c
- gcc -g -O2 -I. -I. -I./common -I./config -DLOCALEDIR="\"/usr/local/share/locale\"" -DCRASH_MERGE -DHAVE_CONFIG_H -I./../include/opcode -I./../opcodes/.. -I./../readline/.. -I../bfd -I./../bfd -I./../include -I../libdecnumber -I./../libdecnumber -I./gnulib/import -Ibuild-gnulib/import -DTUI=1 -Wall -Wdeclaration-after-statement -Wpointer-arith -Wformat-nonliteral -Wno-pointer-sign -Wno-unused -Wunused-value -Wunused-function -Wno-switch -Wno-char-subscripts -Wmissing-prototypes -Wdeclaration-after-statement -Wempty-body -c -o init.o -MT init.o -MMD -MP -MF .deps/init.Tpo init.c
cc -c -g -DARM64 -DGDB_7_6 build_data.c
cc -c -g -DARM64 -DGDB_7_6 main.c
cc -c -g -DARM64 -DGDB_7_6 tools.c
cc -c -g -DARM64 -DGDB_7_6 global_data.c
cc -c -g -DARM64 -DGDB_7_6 memory.c
cc -c -g -DARM64 -DGDB_7_6 filesys.c
cc -c -g -DARM64 -DGDB_7_6 help.c
cc -c -g -DARM64 -DGDB_7_6 task.c
cc -c -g -DARM64 -DGDB_7_6 kernel.c
cc -c -g -DARM64 -DGDB_7_6 test.c
cc -c -g -DARM64 -DGDB_7_6 gdb_interface.c
cc -c -g -DARM64 -DGDB_7_6 net.c
cc -c -g -DARM64 -DGDB_7_6 dev.c
cc -c -g -DARM64 -DGDB_7_6 alpha.c
cc -c -g -DARM64 -DGDB_7_6 x86.c -DMCLX
编译完成之后会在当前目录下生成 crash 可执行文件, 执行这个文件就可以分析 ramdump 了, 分析 ramdump 的时候除了 ramdump 文件之外, 还需要一个 vmlinux 文件, 这个文件就是内核的 symbole 文件.
taotaomami@taotaomami$ crash vmlinux SYS_COREDUMP
crash 7.1.7++
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
- GNU gdb (GDB) 7.6
- Copyright (C) 2013 Free Software Foundation, Inc.
- License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=aarch64-elf-linux"...
跟 GDB 工具一样, crash 也有自己的命令帮助, 这里就不去赘述了, 感兴趣的童鞋可以一个一个的去看, 这里还是跟前面一样通过对一个问题的分析来介绍 crash,ramdump 是如何帮助我们分析问题的.
- crash> help
- * files mach repeat timer
- alias foreach mod runq tree
- ascii fuser mount search union
bt gdb net set vm
btop help p sig vtop
dev ipcs ps struct waitq
dis irq pte swap whatis
eval kmem ptob sym wr
exit list ptov sys q
- extend log rd task
- crash version: 7.1.7++ gdb version: 7.6
- For help on any command above, enter "help <command>".
- For help on input options, enter "help input".
- For help on output options, enter "help output".
问题背景
在某个机型上面, 跑 monkey, 跑一段时间之后差不多所有机器都会卡住, 卡住的时候没法连接 adb 和串口, 更坑人的是重启之后发现 SD 卡下面保存 log 的目录只保留了很少的一部分, 后面的 log 都没有了, 我们只能通过 / data/anr 下面的 trace 文件看到一点信息
- "Binder_1" prio=5 tid=8 Blocked
- | group="main" sCount=1 dsCount=0 obj=0x12c5c0a0 self=0x7fabe09400
| sysTid=1042 nice=0 cgrp=default sched=0/0 handle=0x7fabdff440
- | state=S schedstat=( 155829476357 56801688642 366831 ) utm=11187 stm=4395 core=1 HZ=100
- | stack=0x7fabd04000-0x7fabd06000 stackSize=1009KB
- | held mutexes=
- at com.android.server.MountService.getVolumeList(MountService.java:3139)
- - waiting to lock <0x07a7279e> (a java.lang.Object) held by thread 86
- at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1751)
- at android.os.Binder.execTransact(Binder.java:462)
- "Binder_2" prio=5 tid=9 Blocked
| group="main" sCount=1 dsCount=0 obj=0x12c5f0a0 self=0x7fa2fadc00
| sysTid=1043 nice=0 cgrp=default sched=0/0 handle=0x7fabd01440
- | state=S schedstat=( 152468392027 56015549452 365405 ) utm=11086 stm=4160 core=0 HZ=100
- | stack=0x7fabc06000-0x7fabc08000 stackSize=1009KB
- | held mutexes=
- at com.android.server.MountService.getVolumeList(MountService.java:3139)
- - waiting to lock <0x07a7279e> (a java.lang.Object) held by thread 86
- at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1751)
- at android.os.Binder.execTransact(Binder.java:462)
- "Binder_6" prio=5 tid=86 Native
- | group="main" sCount=1 dsCount=0 obj=0x1404b0a0 self=0x7f92fac600
| sysTid=1691 nice=0 cgrp=default sched=0/0 handle=0x7f929bf440
- | state=S schedstat=( 149288745355 56180291820 364873 ) utm=10576 stm=4352 core=1 HZ=100
- | stack=0x7f928c4000-0x7f928c6000 stackSize=1009KB
- | held mutexes=
- at libcore.io.Posix.statvfs(Native method)
- at libcore.io.BlockGuardOs.statvfs(BlockGuardOs.java:298)
- at java.io.File.getTotalSpace(File.java:1126)
- at android.os.storage.StorageManager.getStorageLowBytes(StorageManager.java:978)
- at android.os.storage.VolumeInfo.buildStorageVolume(VolumeInfo.java:371)
- at com.android.server.MountService.getVolumeList(MountService.java:3146)
- - locked <0x07a7279e> (a java.lang.Object)
- at android.os.storage.IMountService$Stub.onTransact(IMountService.java:1751)
- at android.os.Binder.execTransact(Binder.java:462)
从这里的信息来看, system_server 所有的 binder 线程都卡住了, 并且大多数都是卡在获取 SD 容量大小的接口上面, 综合前面 SD 卡没法保留 log 以及这里的 trace 文件信息, 怀疑 sd 卡工作出了问题, 当时 sd 卡用的是 fuse 文件系统, fuse 文件系统在 user 空间有一个守护进程名字就叫 sdcard, 但是由于连不了 adb 和串口, 我们无法得知当时 sdcard 进程的状态和其他信息, 由于这个问题复现概率还算高, 并且卡住是由于发生了 system_server SWT, 所以决定在 system_server 进程发生 SWT 的时候, 主动让 kernel crash, 抓取 ramdump 文件来分析.
echo c> /proc/sysrq-trigger 可以让 kernel 发生异常重启
因为 fuse 文件系统分为 user 空间和 kernel 空间两层, 而如果 kernel 层出问题的话, 那 kernel 层其他模块很大概率也会出问题, 但是从几次问题现场来看, kernel 应该还是工作正常的, 所以将目光放到了 user 空间层的 sdcard 守护进程上面:
过滤所有 D 状态的进程, UN = UNINTERRUPTIBLE
crash> ps | grep UN
116 2 1 ffffffc003274000 UN 0.0 0 0 [display_esd_che]
121 2 0 ffffffc0b6202000 UN 0.0 0 0 [display_idle_de]
176 2 0 ffffffc0b5334000 UN 0.0 0 0 [hang_detect]
206 2 0 ffffffc0b4b52000 UN 0.0 0 0 [C2K_TX_ASC]
207 2 0 ffffffc0b45a2000 UN 0.0 0 0 [C2K_RX_ASC]
229 2 0 ffffffc0b46a8000 UN 0.0 0 0 [bat_routine_thr]
230 2 0 ffffffc0b45d0000 UN 0.0 0 0 [bat_update_thre]
3046 315 5 ffffffc095fdd000 UN 0.0 20136 2056 sdcard
3048 315 4 ffffffc0adade000 UN 0.0 20136 2056 sdcard
21528 2 0 ffffffc01e2df000 UN 0.0 0 0 [kworker/u16:6]
其中 3046 和 3048 这两个 sdcard 线程的状态都为 D, 对比一下正常的机器, 这两个线程的状态是不正常的, 获取这两个线程的堆栈信息:
crash> bt 3046
PID: 3046 TASK: ffffffc095fdd000 CPU: 5 COMMAND: "sdcard"
- #0 [ffffffc095e43880] __switch_to at ffffffc000087204
- #1 [ffffffc095e438b0] __schedule at ffffffc000ae4164
- #2 [ffffffc095e43a60] schedule at ffffffc000ae4590
- #3 [ffffffc095e43a70] schedule_preempt_disabled at ffffffc000ae45f0
- #4 [ffffffc095e43a80] __mutex_lock_slowpath at ffffffc000ae5da4
- #5 [ffffffc095e43ae0] mutex_lock at ffffffc000ae5eb8
- #6 [ffffffc095e43b00] fuse_reverse_inval_entry at ffffffc0002ab150
- #7 [ffffffc095e43b50] fuse_notify_delete at ffffffc0002a6078
- #8 [ffffffc095e43bc0] fuse_dev_do_write at ffffffc0002a7a24
- #9 [ffffffc095e43c70] fuse_dev_write at ffffffc0002a81a4
- #10 [ffffffc095e43cf0] do_sync_readv_writev at ffffffc00019fb28
- #11 [ffffffc095e43d70] do_readv_writev at ffffffc0001a1444
- #12 [ffffffc095e43e70] vfs_writev at ffffffc0001a1524
- #13 [ffffffc095e43e80] sys_writev at ffffffc0001a1620
- #14 [ffffffc095e43ed0] el0_svc_naked at ffffffc00008542c
PC: 0000007f9deef9c0 LR: 0000007f9e105a50 SP: 0000007f9dbbd260
X29: 0000007f9dbbd260 X28: 0000007f9dbbf360 X27: 000000000000005d
X26: 00000000009eaa0a X25: 0000007f9dbbf3d0 X24: 0000007f9e108360
X23: 0000007fe5ccd56c X22: 0000007fe5ccb448 X21: 0000007f9d546e00
X20: 0000007f9d542c00 X19: 0000000000000006 X18: 0000007f9dd43c00
X17: 0000007f9deef9b8 X16: 0000007f9e119d70 X15: 000000000000005a
X14: 0000007fe5ccb440 X13: 0000007fe5cca440 X12: 0000000000000000
X11: 0101010101010101 X10: 0000007fe5ccd56c X9: 0000000000000018
X8: 0000000000000042 X7: 0000000000000010 X6: 000000000000003d
X5: 0000000000000015 X4: 0000000000000014 X3: 0000007f9dbbd2a8
X2: 0000000000000003 X1: 0000007f9dbbd2d0 X0: 000000000000000a
ORIG_X0: 000000000000000a SYSCALLNO: 42 PSTATE: 00000000
crash> bt 3048
PID: 3048 TASK: ffffffc0adade000 CPU: 4 COMMAND: "sdcard"
- #0 [ffffffc0ac517880] __switch_to at ffffffc000087204
- #1 [ffffffc0ac5178b0] __schedule at ffffffc000ae4164
- #2 [ffffffc0ac517a60] schedule at ffffffc000ae4590
- #3 [ffffffc0ac517a70] schedule_preempt_disabled at ffffffc000ae45f0
- #4 [ffffffc0ac517a80] __mutex_lock_slowpath at ffffffc000ae5da4
- #5 [ffffffc0ac517ae0] mutex_lock at ffffffc000ae5eb8
- #6 [ffffffc0ac517b00] fuse_reverse_inval_entry at ffffffc0002ab150
- #7 [ffffffc0ac517b50] fuse_notify_delete at ffffffc0002a6078
- #8 [ffffffc0ac517bc0] fuse_dev_do_write at ffffffc0002a7a24
- #9 [ffffffc0ac517c70] fuse_dev_write at ffffffc0002a81a4
- #10 [ffffffc0ac517cf0] do_sync_readv_writev at ffffffc00019fb28
- #11 [ffffffc0ac517d70] do_readv_writev at ffffffc0001a1444
- #12 [ffffffc0ac517e70] vfs_writev at ffffffc0001a1524
- #13 [ffffffc0ac517e80] sys_writev at ffffffc0001a1620
- #14 [ffffffc0ac517ed0] el0_svc_naked at ffffffc00008542c
PC: 0000007f9deef9c0 LR: 0000007f9e105a50 SP: 0000007f9d9c1260
X29: 0000007f9d9c1260 X28: 0000007f9d9c3360 X27: 0000007f9d6ce3e0
X26: 0000000000120edc X25: 0000007f9d9c33d0 X24: 0000007f9e108360
X23: 0000007fe5d4d62c X22: 0000007fe5cc9418 X21: 0000007f9d546e00
X20: 0000007f9d542b80 X19: 0000000000000006 X18: 00000000000001fd
X17: 0000007f9deef9b8 X16: 0000007f9e119d70 X15: 0000000000000000
X14: 00000000000081b4 X13: 0000000000000000 X12: 0000000000000000
X11: 0101010101010101 X10: 0000007fe5d4d62c X9: 0000000000000018
X8: 0000000000000042 X7: 0000000000000010 X6: 000000000000003d
X5: 0000000000000015 X4: 0000000000000014 X3: 0000007f9d9c12a8
X2: 0000000000000003 X1: 0000007f9d9c12d0 X0: 0000000000000004
ORIG_X0: 0000000000000004 SYSCALLNO: 42 PSTATE: 00000000
从堆栈来看, 这两个线程都是在删除文件, 并且最终都在内核态等锁, 其中 mutex_lock 和 mutex 的定义如下:
- crash> whatis mutex_lock
- void mutex_lock(struct mutex *);
crash> whatis struct mutex
- struct mutex {
- atomic_t count;
- spinlock_t wait_lock;
- struct list_head wait_list;
- struct task_struct *owner;
- struct optimistic_spin_queue osq;
- }
- SIZE: 40
mutex 里面记录了这个锁被谁持有, 所以如果可以找到 mutex 的内容, 那么就可以找到这个锁的持有者了, 接下来介绍一种简单的寻找锁持有者的方法, 反汇编 mutex_lock 的方法
- crash> dis mutex_lock
- 0xffffffc000ae5e74 <mutex_lock>: stp x29, x30, [sp,#-32]!
- 0xffffffc000ae5e78 <mutex_lock+4>: mov x29, sp
- 0xffffffc000ae5e7c <mutex_lock+8>: str x19, [sp,#16]
- 0xffffffc000ae5e80 <mutex_lock+12>: mov x19, x0 // 将参数 mutex 传给了 X19,
- 0xffffffc000ae5e84 <mutex_lock+16>: ldxr w1, [x0]
- 0xffffffc000ae5e88 <mutex_lock+20>: sub w1, w1, #0x1
- 0xffffffc000ae5e8c <mutex_lock+24>: stlxr w2, w1, [x0]
- 0xffffffc000ae5e90 <mutex_lock+28>: cbnz w2, 0xffffffc000ae5e84 <mutex_lock+16>
- 0xffffffc000ae5e94 <mutex_lock+32>: dmb ish
- 0xffffffc000ae5e98 <mutex_lock+36>: tbnz w1, #31, 0xffffffc000ae5eb8 <mutex_lock+68>
- 0xffffffc000ae5e9c <mutex_lock+40>: mov x0, sp
- 0xffffffc000ae5ea0 <mutex_lock+44>: and x0, x0, #0xffffffffffffc000
- 0xffffffc000ae5ea4 <mutex_lock+48>: ldr x0, [x0,#16]
- 0xffffffc000ae5ea8 <mutex_lock+52>: str x0, [x19,#24]
- 0xffffffc000ae5eac <mutex_lock+56>: ldr x19, [sp,#16]
- 0xffffffc000ae5eb0 <mutex_lock+60>: ldp x29, x30, [sp],#32
- 0xffffffc000ae5eb4 <mutex_lock+64>: ret
- 0xffffffc000ae5eb8 <mutex_lock+68>: bl 0xffffffc000ae5d10 <__mutex_lock_slowpath> //mutex_lock 从这里调用的__mutex_lock_slowpath
- 0xffffffc000ae5ebc <mutex_lock+72>: mov x0, sp
- 0xffffffc000ae5ec0 <mutex_lock+76>: and x0, x0, #0xffffffffffffc000
- 0xffffffc000ae5ec4 <mutex_lock+80>: ldr x0, [x0,#16]
- 0xffffffc000ae5ec8 <mutex_lock+84>: str x0, [x19,#24]
- 0xffffffc000ae5ecc <mutex_lock+88>: ldr x19, [sp,#16]
- 0xffffffc000ae5ed0 <mutex_lock+92>: ldp x29, x30, [sp],#32
- 0xffffffc000ae5ed4 <mutex_lock+96>: ret
从 0xffffffc000ae5e80 到 0xffffffc000ae5eb8 这一段的指令里面 X19 寄存器都没有再被修改, 所以找出 X19 寄存器的内容就是 struct mutex 的地址, 再看__mutex_lock_slowpath 的反汇编代码:
- crash> dis __mutex_lock_slowpath
- 0xffffffc000ae5d10 <__mutex_lock_slowpath>: stp x29, x30, [sp,#-96]!
- 0xffffffc000ae5d14 <__mutex_lock_slowpath+4>: mov x1, sp
- 0xffffffc000ae5d18 <__mutex_lock_slowpath+8>: mov x29, sp
- 0xffffffc000ae5d1c <__mutex_lock_slowpath+12>: and x1, x1, #0xffffffffffffc000
- 0xffffffc000ae5d20 <__mutex_lock_slowpath+16>: stp x19, x20, [sp,#16] // 把 x19 寄存器存放到了 [sp,#16]
- 0xffffffc000ae5d24 <__mutex_lock_slowpath+20>: stp x21, x22, [sp,#32]
- 0xffffffc000ae5d28 <__mutex_lock_slowpath+24>: stp x23, x24, [sp,#48]
- 0xffffffc000ae5d2c <__mutex_lock_slowpath+28>: mov x19, x0
- 0xffffffc000ae5d30 <__mutex_lock_slowpath+32>: mov w0, #0x1 // #1
- 0xffffffc000ae5d34 <__mutex_lock_slowpath+36>: ldr x22, [x1,#16]
- 0xffffffc000ae5d38 <__mutex_lock_slowpath+40>: bl 0xffffffc0000c7bac <preempt_count_add>
- 0xffffffc000ae5d3c <__mutex_lock_slowpath+44>: mov x1, #0x0 // #0
- 0xffffffc000ae5d40 <__mutex_lock_slowpath+48>: mov x0, x19
- 0xffffffc000ae5d44 <__mutex_lock_slowpath+52>: mov w2, w1
- 0xffffffc000ae5d48 <__mutex_lock_slowpath+56>: bl 0xffffffc0000e6554 <mutex_optimistic_spin>
- 0xffffffc000ae5d4c <__mutex_lock_slowpath+60>: uxtb w0, w0
- 0xffffffc000ae5d50 <__mutex_lock_slowpath+64>: cbnz w0, 0xffffffc000ae5e08 <__mutex_lock_slowpath+248>
- ......
- 0xffffffc000ae5e60 <__mutex_lock_slowpath+336>: ldp x19, x20, [sp,#16]
- 0xffffffc000ae5e64 <__mutex_lock_slowpath+340>: ldp x21, x22, [sp,#32]
- 0xffffffc000ae5e68 <__mutex_lock_slowpath+344>: ldp x23, x24, [sp,#48]
- 0xffffffc000ae5e6c <__mutex_lock_slowpath+348>: ldp x29, x30, [sp],#96
- 0xffffffc000ae5e70 <__mutex_lock_slowpath+352>: ret
从上面的反汇编代码来看, struct mutex 的地址被放到了与__mutex_lock_slowpath 栈指针偏移 16 字节的地方, 而__mutex_lock_slowpath 的栈指针为 ffffffc095e43a80.
4 [ffffffc095e43a80] __mutex_lock_slowpath at ffffffc000ae5da4
5 [ffffffc095e43ae0] mutex_lock at ffffffc000ae5eb8
导出 ffffffc095e43a80 到 ffffffc095e43ae0 这段内存的内容
crash> rd ffffffc095e43a80 -e ffffffc095e43ae0
ffffffc095e43a80: ffffffc095e43ae0 ffffffc000ae5ebc .:.......^......
ffffffc095e43a90: ffffffc00f387b98 00000000ffffffec .{8.............
ffffffc095e43aa0: ffffffc00f387b98 ffffffc095e43b98 .{8......;......
ffffffc095e43ab0: 0000007f9d542c00 000000000000003d .,T.....=.......
ffffffc095e43ac0: ffffffc095e43ae0 ffffffc00f387ba0 .:.......{8.....
ffffffc095e43ad0: ffffffc00f387ba0 ffffffc095fdd000 .{8.............
可以得到 struct mutex 的地址为 ffffffc00f387b98, 利用这种方式可以得到你想要的变量的内容, 打印这个锁可以获取这个锁的持有者:
crash> struct mutex ffffffc00f387b98
- struct mutex {
- count = {
- counter = -1
- },
- wait_lock = {
- {
- rlock = {
- raw_lock = {
- owner = 1,
- next = 1
- }
- }
- }
- },
- wait_list = {
- next = 0xffffffc095e43ac8,
- prev = 0xffffffc095e43ac8
- },
- owner = 0xffffffc00bf4e000,
- osq = {
- tail = {
- counter = 0
- }
- }
- }
- crash> struct task_struct.pid 0xffffffc00bf4e000
- pid = 5446
得到 3046 这个 sdcard 线程正在等的锁的持有者是 5446 这个进程, 接下来获取 5446 的堆栈:
- crash> bt 5446
- PID: 5446 TASK: ffffffc00bf4e000 CPU: 4 COMMAND: "io"
- #0 [ffffffc0079b3a70] __switch_to at ffffffc000087204
- #1 [ffffffc0079b3aa0] __schedule at ffffffc000ae4164
- #2 [ffffffc0079b3c50] schedule at ffffffc000ae4590
- #3 [ffffffc0079b3c60] wait_answer_interruptible at ffffffc0002a4f34
- #4 [ffffffc0079b3cc0] __fuse_request_send at ffffffc0002a5424
- #5 [ffffffc0079b3d30] fuse_request_send at ffffffc0002a83d4
- #6 [ffffffc0079b3d40] fuse_unlink at ffffffc0002a8fb0
- #7 [ffffffc0079b3d80] vfs_unlink at ffffffc0001af044
- #8 [ffffffc0079b3dc0] do_unlinkat at ffffffc0001af2f4
- #9 [ffffffc0079b3ec0] sys_unlinkat at ffffffc0001af350
- #10 [ffffffc0079b3ed0] el0_svc_naked at ffffffc00008542c
PC: f751c298 LR: f7513639 SP: e9570150 PSTATE: 60070010
X12: 33116380 X11: 32e89670 X10: 348c61f0 X9: eeed1200
X8: 348c61f0 X7: 00000148 X6: eeed63c0 X5: 713b8ea0
X4: df154348 X3: 000081b0 X2: 00000000 X1: df154348
X0: ffffff9c
从这个堆栈来看, 这个线程也是在删除文件....... , 但是这里是发送文件删除请求给 fuse 文件系统, fuse 文集系统 kernel 层代码接收到请求的时候需要重新把请求路由给上层的 sdcard 进程, 所以上面的 3046 和 3048 其中有一个线程就是在处理 5446 的删除文件请求, 用同样的方法得到 3048 进程的锁是被 12051 获取了, 而从它的堆栈来看, 它也是在删除文件
crash> struct task_struct.pid 0xffffffc0082ac000
- pid = 12051
- crash> bt 12051
PID: 12051 TASK: ffffffc0082ac000 CPU: 5 COMMAND: "RxComputationSc"
- #0 [ffffffc00ab7ba70] __switch_to at ffffffc000087204
- #1 [ffffffc00ab7baa0] __schedule at ffffffc000ae4164
- #2 [ffffffc00ab7bc50] schedule at ffffffc000ae4590
- #3 [ffffffc00ab7bc60] wait_answer_interruptible at ffffffc0002a4f34
- #4 [ffffffc00ab7bcc0] __fuse_request_send at ffffffc0002a5374
- #5 [ffffffc00ab7bd30] fuse_request_send at ffffffc0002a83d4
- #6 [ffffffc00ab7bd40] fuse_unlink at ffffffc0002a8fb0
- #7 [ffffffc00ab7bd80] vfs_unlink at ffffffc0001af044
- #8 [ffffffc00ab7bdc0] do_unlinkat at ffffffc0001af2f4
- #9 [ffffffc00ab7bec0] sys_unlinkat at ffffffc0001af350
- #10 [ffffffc00ab7bed0] el0_svc_naked at ffffffc00008542c
PC: 0000007f89fc7930 LR: 0000007f89fb72d8 SP: 0000007f6d1be4e0
X29: 0000007f6d1be4e0 X28: 000000001341e240 X27: 0000000013424180
X26: 000000001341c4a0 X25: 000000001341e240 X24: 0000000013210ea0
X23: 000000001341ba90 X22: 0000000013424180 X21: 0000007f6d1be5c8
X20: 0000007f80254c00 X19: 0000007f69a44980 X18: 0000007f86444b78
X17: 0000007f89fc7928 X16: 0000007f8a035510 X15: 0000000000000000
X14: 0000000000000001 X13: 0000000000000017 X12: 0000000000000000
X11: 0000000000000036 X10: 0000007f8643c9d0 X9: 0000007f8643a000
X8: 0000000000000023 X7: 0000000000430000 X6: 0000000000000000
X5: 000000000000002f X4: 0000000000430000 X3: 0000000000000100
X2: 0000000000000000 X1: 0000007f69a44980 X0: 00000000ffffff9c
ORIG_X0: 00000000ffffff9c SYSCALLNO: 23 PSTATE: 20000000
这个问题分析到这, 大致可以知道是有两个线程同时向 fuse 文件系统删除文件, 然后导致系统卡住, 隐约可以感觉到这是一个死锁问题, 只不过这个不是普通之间的线程死锁, 而是进程之间, user 空间和 kernel 空间的死锁问题, 既然都是在删除文件, 那么到底是在删除什么文件呢? 类似前面获取 mutex 地址的方式一样, 我们从 5446 和 12051 这两个线程的上下文里面获取它们正在删除的文件
- crash> whatis fuse_unlink
- int fuse_unlink(struct inode *, struct dentry *); //dentry 里面就包含了路径信息
5446 这个线程正在删除的文件信息
- crash> bt 5446
- PID: 5446 TASK: ffffffc00bf4e000 CPU: 4 COMMAND: "io"
- #0 [ffffffc0079b3a70] __switch_to at ffffffc000087204
- #1 [ffffffc0079b3aa0] __schedule at ffffffc000ae4164
- #2 [ffffffc0079b3c50] schedule at ffffffc000ae4590
- #3 [ffffffc0079b3c60] wait_answer_interruptible at ffffffc0002a4f34
- #4 [ffffffc0079b3cc0] __fuse_request_send at ffffffc0002a5424
- #5 [ffffffc0079b3d30] fuse_request_send at ffffffc0002a83d4
- #6 [ffffffc0079b3d40] fuse_unlink at ffffffc0002a8fb0
- #7 [ffffffc0079b3d80] vfs_unlink at ffffffc0001af044
- #8 [ffffffc0079b3dc0] do_unlinkat at ffffffc0001af2f4
- #9 [ffffffc0079b3ec0] sys_unlinkat at ffffffc0001af350
- #10 [ffffffc0079b3ed0] el0_svc_naked at ffffffc00008542c
PC: f751c298 LR: f7513639 SP: e9570150 PSTATE: 60070010
X12: 33116380 X11: 32e89670 X10: 348c61f0 X9: eeed1200
X8: 348c61f0 X7: 00000148 X6: eeed63c0 X5: 713b8ea0
X4: df154348 X3: 000081b0 X2: 00000000 X1: df154348
X0: ffffff9c
crash> rd ffffffc0079b3cc0 -e ffffffc0079b3d30
ffffffc0079b3cc0: ffffffc0079b3d30 ffffffc0002a83d8 0=........*.....
ffffffc0079b3cd0: ffffffc0235981a0 ffffffc06a521e40 ..Y#....@.Rj....
ffffffc0079b3ce0: ffffffc098daa800 ffffffc06a521e40 ........@.Rj....
ffffffc0079b3cf0: ffffffc00f387b00 ffffffc0079b3e20 .{8.....>......
ffffffc0079b3d00: ffffffc0079b3d10 ffffffc0001aa028 .=......(.......
ffffffc0079b3d10: ffffffc0079b3d60 ffffffc0002c9bc8 `=........,.....
ffffffc0079b3d20: ffffffc06a521e40 ffffffc06a521e40 @.Rj....@.Rj....
crash> struct dentry ffffffc06a521e40
- struct dentry {
- d_flags = 4718788,
- d_seq = {
- sequence = 2
- },
- d_hash = {
- next = 0x0,
- pprev = 0xffffffc0b7c77f28
- },
- d_parent = 0xffffffc0052f96c0,
- d_name = {
- {
- {
- hash = 114945031,
- len = 20
- },
- hash_len = 86014290951
- },
- name = 0xffffffc06a521e78 "10048316241362964103"
- },
- d_inode = 0xffffffc033afcb00,
- d_iname = "10048316241362964103\000\000s\000l\000bytes",
- d_lockref = {
- {
- lock_count = 4295294981,
- {
- lock = {
- {
- rlock = {
- raw_lock = {
- owner = 5,
- next = 5
- }
- }
- }
- },
- count = 1
- }
- }
- },
- d_op = 0xffffffc000b3fec0 <fuse_dentry_operations>,
- d_sb = 0xffffffc098daa000,
- d_time = 4299504276,
- d_fsdata = 0x0,
- d_lru = {
- next = 0xffffffc095eff088,
- prev = 0xffffffc024a50140
- },
- d_child = {
- next = 0xffffffc03bec35d0,
- prev = 0xffffffc0052f9760
- },
- d_subdirs = {
- next = 0xffffffc06a521ee0,
- prev = 0xffffffc06a521ee0
- },
- d_u = {
- d_alias = {
- next = 0x0,
- pprev = 0xffffffc033afcc08
- },
- d_rcu = {
- next = 0x0,
- func = 0xffffffc033afcc08
- }
- }
- }
12051 正在删除的文件信息:
crash> struct dentry ffffffc007383180
- struct dentry {
- d_flags = 4718788,
- d_seq = {
- sequence = 2
- },
- d_hash = {
- next = 0x0,
- pprev = 0xffffffc0b7d70318
- },
- d_parent = 0xffffffc03ad70780,
- d_name = {
- {
- {
- hash = 3934655135,
- len = 20
- },
- hash_len = 89834001055
- },
- name = 0xffffffc0073831b8 "15919563341606802260"
- },
- d_inode = 0xffffffc00398b2c0,
- d_iname = "15919563341606802260\000m\000re.bytes",
- d_lockref = {
- {
- lock_count = 4295491592,
- {
- lock = {
- {
- rlock = {
- raw_lock = {
- owner = 8,
- next = 8
- }
- }
- }
- },
- count = 1
- }
- }
- },
- d_op = 0xffffffc000b3fec0 <fuse_dentry_operations>,
- d_sb = 0xffffffc003290800,
- d_time = 4299504227,
- d_fsdata = 0x0,
- d_lru = {
- next = 0xffffffc00acf1d40,
- prev = 0xffffffc059711980
- },
- d_child = {
- next = 0xffffffc059711990,
- prev = 0xffffffc00acf1d50
- },
- d_subdirs = {
- next = 0xffffffc007383220,
- prev = 0xffffffc007383220
- },
- d_u = {
- d_alias = {
- next = 0x0,
- pprev = 0xffffffc00398b3c8
- },
- d_rcu = {
- next = 0x0,
- func = 0xffffffc00398b3c8
- }
- }
- }
上面这么看还是不够直观, 在 crash 工具下面还发现一个很好用的命令 files, 用这个命令可以很直观的把文件路径打出来,
crash> files -d ffffffc06a521e40
DENTRY INODE SUPERBLK TYPE PATH
ffffffc06a521e40 ffffffc033afcb00 ffffffc098daa000 REG 10048316241362964103
- // 保存的这个 ramdump 里面不知道为什么这里打印不出来完整的路径名,
- // 但是在当时分析的其他 ramdump 里面是能打印出来, 也是在删除 com.xxx 目录下的文件,
- // 只是路径前缀不一样, 这个与 Android M 的运行时权限机制有关, 像 surfaceflinger 这种系统服务
- // 进程和普通的应用进程都去访问 SD 卡下的同一个文件, 但是它们看到的路径却是不一样的, 比如
- //surfaceflinger 看到的是 / storage/runtime/default 开头, 而普通应用看到的是 / mnt/runtime/write
- crash> files -d ffffffc007383180
DENTRY INODE SUPERBLK TYPE PATH
ffffffc007383180 ffffffc00398b2c0 ffffffc003290800 REG /storage/runtime/default/emulated/0/com.xxx/15919563341606802260
从上面可以得知 3048 进程的锁是被 12051 获取了, 而 12051 正在删除 / storage/runtime/default/emulated/0/com.xxx/15919563341606802260 这个文件, 那我们是否可以从 3048 这个进程里面获取到它正在删除哪个文件呢? 如果从它的上下文里面得到它正在删除 10048316241362964103 这个文件, 那么就可以证明这是一个死锁的问题了, 同样还是利用上面那个方法, 从 3048 的堆栈上下文里面去分析它正在删除哪个文件.
crash> bt 3048
PID: 3048 TASK: ffffffc0adade000 CPU: 4 COMMAND: "sdcard"
- #0 [ffffffc0ac517880] __switch_to at ffffffc000087204
- #1 [ffffffc0ac5178b0] __schedule at ffffffc000ae4164
- #2 [ffffffc0ac517a60] schedule at ffffffc000ae4590
- #3 [ffffffc0ac517a70] schedule_preempt_disabled at ffffffc000ae45f0
- #4 [ffffffc0ac517a80] __mutex_lock_slowpath at ffffffc000ae5da4
- #5 [ffffffc0ac517ae0] mutex_lock at ffffffc000ae5eb8
- #6 [ffffffc0ac517b00] fuse_reverse_inval_entry at ffffffc0002ab150
- #7 [ffffffc0ac517b50] fuse_notify_delete at ffffffc0002a6078
- #8 [ffffffc0ac517bc0] fuse_dev_do_write at ffffffc0002a7a24
- #9 [ffffffc0ac517c70] fuse_dev_write at ffffffc0002a81a4
- #10 [ffffffc0ac517cf0] do_sync_readv_writev at ffffffc00019fb28
- #11 [ffffffc0ac517d70] do_readv_writev at ffffffc0001a1444
- #12 [ffffffc0ac517e70] vfs_writev at ffffffc0001a1524
- #13 [ffffffc0ac517e80] sys_writev at ffffffc0001a1620
- #14 [ffffffc0ac517ed0] el0_svc_naked at ffffffc00008542c
- crash> whatis fuse_notify_delete
- int fuse_notify_delete(struct fuse_conn *, unsigned int, struct fuse_copy_state *);
- crash> whatis fuse_reverse_inval_entry
- int fuse_reverse_inval_entry(struct super_block *, u64, u64, struct qstr *);
- crash> dis fuse_reverse_inval_entry
- 0xffffffc0002ab108 <fuse_reverse_inval_entry>: stp x29, x30, [sp,#-80]!
- 0xffffffc0002ab10c <fuse_reverse_inval_entry+4>: mov x29, sp
- 0xffffffc0002ab110 <fuse_reverse_inval_entry+8>: add x4, x29, #0x50
- 0xffffffc0002ab114 <fuse_reverse_inval_entry+12>: stp x19, x20, [sp,#16]
- 0xffffffc0002ab118 <fuse_reverse_inval_entry+16>: stp x21, x22, [sp,#32]
- 0xffffffc0002ab11c <fuse_reverse_inval_entry+20>: stp x23, x24, [sp,#48]
- 0xffffffc0002ab120 <fuse_reverse_inval_entry+24>: str x1, [x4,#-8]!
- 0xffffffc0002ab124 <fuse_reverse_inval_entry+28>: mov x23, x2
- 0xffffffc0002ab128 <fuse_reverse_inval_entry+32>: adrp x2, 0xffffffc0002b0000 <fuse_writepages+60>
- 0xffffffc0002ab12c <fuse_reverse_inval_entry+36>: mov x22, x3 //struct qstr 参数放到了 X22 寄存器
- 0xffffffc0002ab130 <fuse_reverse_inval_entry+40>: add x2, x2, #0xfa8
- 0xffffffc0002ab134 <fuse_reverse_inval_entry+44>: mov x3, x4
- 0xffffffc0002ab138 <fuse_reverse_inval_entry+48>: bl 0xffffffc0001ba738 <ilookup5>
- 0xffffffc0002ab13c <fuse_reverse_inval_entry+52>: mov x19, x0
- 0xffffffc0002ab140 <fuse_reverse_inval_entry+56>: cbz x0, 0xffffffc0002ab1e4 <fuse_reverse_inval_entry+220>
- 0xffffffc0002ab144 <fuse_reverse_inval_entry+60>: add x21, x0, #0x98
- 0xffffffc0002ab148 <fuse_reverse_inval_entry+64>: mov w20, #0xffffffec // #-20
- 0xffffffc0002ab14c <fuse_reverse_inval_entry+68>: mov x0, x21
- 0xffffffc0002ab150 <fuse_reverse_inval_entry+72>: bl 0xffffffc000ae5e74 <mutex_lock> // 程序从这里跳到 mutex_lock 继续执行.
- .......
- <fuse_reverse_inval_entry+112>
继续分析 mutex_lock 和__mutex_lock_slowpath 的汇编代码, 发现它们都没有修改 X22 的值, 并且在__mutex_lock_slowpath 的汇编里面把 X22 保存在函数栈里面
- crash> dis __mutex_lock_slowpath
- 0xffffffc000ae5d10 <__mutex_lock_slowpath>: stp x29, x30, [sp,#-96]!
- 0xffffffc000ae5d14 <__mutex_lock_slowpath+4>: mov x1, sp
- 0xffffffc000ae5d18 <__mutex_lock_slowpath+8>: mov x29, sp
- 0xffffffc000ae5d1c <__mutex_lock_slowpath+12>: and x1, x1, #0xffffffffffffc000
- 0xffffffc000ae5d20 <__mutex_lock_slowpath+16>: stp x19, x20, [sp,#16]
- 0xffffffc000ae5d24 <__mutex_lock_slowpath+20>: stp x21, x22, [sp,#32] //X22 保存在与 SP 偏移 32 字节的地方
- 0xffffffc000ae5d28 <__mutex_lock_slowpath+24>: stp x23, x24, [sp,#48]
- ......
crash> rd ffffffc0ac517a80 -e ffffffc0ac517ae0
ffffffc0ac517a80: ffffffc0ac517ae0 ffffffc000ae5ebc .zQ......^......
ffffffc0ac517a90: ffffffc04f747098 00000000ffffffec .ptO............
ffffffc0ac517aa0: ffffffc04f747098 ffffffc0ac517b98 .ptO.....{Q.....
ffffffc0ac517ab0: 0000007f9d542b80 000000000000003d .+T.....=.......
ffffffc0ac517ac0: ffffffc0ac517ae0 ffffffc04f7470a0 .zQ......ptO....
ffffffc0ac517ad0: ffffffc04f7470a0 ffffffc0adade000 .ptO............
crash> struct qstr ffffffc0ac517b98
- struct qstr {
- {
- {
- hash = 114945031,
- len = 20
- },
- hash_len = 86014290951
- },
- name = 0xffffffc010143000 "10048316241362964103" // 果然 3048 正在删除 10048316241362964103 这个文件
- }
分析到这里, 可以知道这个问题确实是进程间死锁问题, 然后根据目前获得的消息, 由两个不同权限的进程同时去删除 SD 卡同一个目录下的文件 (例如 surfaceflinger 和普通 app 进程), 果然复现了这个问题, 找到了必现的路径, 反馈给 BSP 有关同事之后, 他们认为是 fuse 文件系统本身的一个缺陷, 但是由于项目紧急, 所以当时只是做了 workground 方案.
ramdump 提取 coredump 文件的方法
既然 ramdump 文件是整个内存的镜像, 那么理论上来讲, 从它里面是可以提取单个进程的 coredump 的, 网上搜罗了一堆资料, 最后在 crash extension modules http://people.redhat.com/anderson/extensions.html 这个网页里面找到了方法, crash 是提供可扩展的, 可以为它写一些类似插件的功能, gcore 就是其中一款, 可以利用这个插件从 ramdump 里面提取出单个进程的 coredump 文件.
Create a core dump file of a user-space task that was running in a kernel dumpfile. The module supports the X86, X86_64, ARM, ARM64, MIPS and PPC64 architectures.
If /usr/include/crash/defs.h exists, the module may be built like so:
- $ tar xzf http://people.redhat.com/anderson/extensions/crash-gcore-command-1.4.0.tar.gz
- $ cd crash-gcore-command-1.4.0
- $ make -f gcore.mk
- Alternatively, http://people.redhat.com/anderson/crash_sources/defs.h may be copied to the current directory:
- $ tar xzf http://people.redhat.com/anderson/extensions/crash-gcore-command-1.4.0.tar.gz
- $ cd crash-gcore-command-1.4.0
- $ cp <path-to>/ http://people.redhat.com/anderson/crash_sources/defs.h .
- $ make -f gcore.mk
- To build the module from the top-level crash-<version> directory, enter:
- $ tar xzf http://people.redhat.com/anderson/extensions/crash-gcore-command-1.4.0.tar.gz
- $ mv crash-gcore-command-1.4.0/* extensions
- $ make extensions
- This module's source files are self-contained in http://people.redhat.com/anderson/extensions/crash-gcore-command-1.4.0.tar.gz .
- Changelog http://people.redhat.com/anderson/extensions/gcore.changelog.html
Author: Daisuke Hatayama mailto:d.hatayama@jp.fujitsu.com
来源: http://www.jianshu.com/p/dabaded56e03