《奔跑吧Linux内核(第2版)卷2:调试与案例分析》- qemu + kdump 调试分析
[复制链接]
通过更换工具链的方法修复了,编译内核后无法正常在 qemu 的 debian 环境中使用 crash 调试的问题。
之前的工具链是
▸ aarch64-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/bin/aarch64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-linux-gnu/11/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../gcc-11.2.1-20210728/configure --bindir=/usr/bin --build=x86_64-redhat-linux-gnu --datadir=/usr/share --disable-decimal-float --disable-dependency-tracking --disable-gold --disable-libgcj --disable-libgomp --disable-libmpx --disable-libquadmath --disable-libssp --disable-libunwind-exceptions --disable-shared --disable-silent-rules --disable-sjlj-exceptions --disable-threads --with-ld=/usr/bin/aarch64-linux-gnu-ld --enable-__cxa_atexit --enable-checking=release --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++ --enable-linker-build-id --enable-lto --enable-nls --enable-obsolete --enable-plugin --enable-targets=all --exec-prefix=/usr --host=x86_64-redhat-linux-gnu --includedir=/usr/include --infodir=/usr/share/info --libexecdir=/usr/libexec --localstatedir=/var --mandir=/usr/share/man --prefix=/usr --program-prefix=aarch64-linux-gnu- --sbindir=/usr/sbin --sharedstatedir=/var/lib --sysconfdir=/etc --target=aarch64-linux-gnu --with-bugurl=http://bugzilla.redhat.com/bugzilla/ --with-gcc-major-version-only --with-isl --with-newlib --with-plugin-ld=/usr/bin/aarch64-linux-gnu-ld --with-sysroot=/usr/aarch64-linux-gnu/sys-root --with-system-libunwind --with-system-zlib --without-headers --enable-gnu-indirect-function
--with-linker-hash-style=gnu
Thread model: single
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.1 20210728 (Red Hat Cross 11.2.1-1) (GCC)
更换的工具链是
▸ ./aarch64-linux-gcc -v
Using built-in specs.
COLLECT_GCC=/home/red/.local/bin/m3568-sdk-v1.0.0-ga/gcc-buildroot-9.3.0-2020.03-x86_64_aarch64-rockchip-linux-gnu/bin/aarch64-linux-gcc.br_real
COLLECT_LTO_WRAPPER=/home/red/.local/bin/m3568-sdk-v1.0.0-ga/gcc-buildroot-9.3.0-2020.03-x86_64_aarch64-rockchip-linux-gnu/bin/../libexec/gcc/aarch64-rockchip-linux-gnu/9.3.0/lto-wrapper
Target: aarch64-rockchip-linux-gnu
Configured with: ./configure --prefix=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host --sysconfdir=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host/etc --enable-static --target=aarch64-rockchip-linux-gnu --with-sysroot=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host/aarch64-rockchip-linux-gnu/sysroot --enable-__cxa_atexit --with-gnu-ld --disable-libssp --disable-multilib --disable-decimal-float --with-gmp=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host --with-mpc=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host --with-mpfr=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host --with-pkgversion='Buildroot 2018.02-rc3-g548dfbfc13-dirty' --with-bugurl=http://bugs.buildroot.net/ --disable-libquadmath --enable-tls --enable-plugins --enable-lto --enable-threads --with-isl=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host --with-abi=lp64 --with-cpu=cortex-a55 --enable-languages=c,c++ --with-build-time-tools=/home/yhx/RK356X/SDK/buildroot/output/rockchip_toolchain/host/aarch64-rockchip-linux-gnu/bin --enable-shared --enable-libgomp
Thread model: posix
gcc version 9.3.0 (Buildroot 2018.02-rc3-g548dfbfc13-dirty)
重新编译内核,rootfs,之后再次启动,加载 oops.ko 导致系统重启,然后使用 crash 工具调试,
benshushu:crash# crash 202405050433/dump.202405050433 /mnt/vmlinux
crash 7.2.5
Copyright (C) 2002-2019 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...
KERNEL: /mnt/vmlinux
DUMPFILE: 202405050433/dump.202405050433 [PARTIAL DUMP]
CPUS: 4
DATE: Sun May 5 04:32:52 2024
UPTIME: 2135039823346 days, 00:17:08
LOAD AVERAGE: 2.13, 1.04, 0.41
TASKS: 91
NODENAME: benshushu
RELEASE: 5.0.0+
VERSION: #2 SMP Sun May 5 12:19:45 CST 2024
MACHINE: aarch64 (unknown Mhz)
MEMORY: 1 GB
PANIC: "Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050"
PID: 1931
COMMAND: "insmod"
TASK: ffff800023936200 [THREAD_INFO: ffff800023936200]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash>
看着比较顺利,加载 oops.ko 继续调试,就又出新的问题了
crash> mod -s oops /mnt/rlk_lab/rlk_senior/Volume_2/chapter_8/lab01/oops.ko
MODULE NAME SIZE OBJECT FILE
ffff0000098e2040 oops 16384 /mnt/rlk_lab/rlk_senior/Volume_2/chapter_8/lab01/oops.ko
crash> bt
PID: 1931 TASK: ffff800023936200 CPU: 1 COMMAND: "insmod"
#0 [ffff800023bcecb0] machine_kexec at ffff0000100dbf18
#1 [ffff800023bcedd0] __crash_kexec at ffff00001037a120
#2 [ffff800023bcef60] crash_kexec at ffff00001037a370
#3 [ffff800023bcf030] die at ffff0000100adb34
#4 [ffff800023bcf170] die_kernel_fault at ffff0000100e6494
#5 [ffff800023bcf1a0] __do_kernel_fault at ffff0000100e66b8
#6 [ffff800023bcf220] do_page_fault at ffff0000118d7eb0
#7 [ffff800023bcf490] do_translation_fault at ffff0000118d7fc0
#8 [ffff800023bcf4f0] do_mem_abort at ffff000010081274
#9 [ffff800023bcf660] el1_ia at ffff0000100858cc
#10 [ffff800023bcf670] create_oops at ffff0000098e0018 [oops]
#11 [ffff800023bcf6a0] _MODULE_INIT_START_oops at ffff0000098e509c [oops]
#12 [ffff800023bcf720] do_one_initcall at ffff000010087f34
#13 [ffff800023bcf970] do_init_module at ffff00001036ed40
#14 [ffff800023bcf9d0] load_module at ffff00001036fd34
#15 [ffff800023bcfba0] __se_sys_finit_module at ffff000010370368
#16 [ffff800023bcfc80] __arm64_sys_finit_module at ffff000010370218
#17 [ffff800023bcfca0] __invoke_syscall at ffff0000100c1400
#18 [ffff800023bcfcc0] invoke_syscall at ffff0000100c14ac
#19 [ffff800023bcfd30] el0_svc_common at ffff0000100c15e0
#20 [ffff800023bcfdd0] el0_svc_handler at ffff0000100c1adc
#21 [ffff800023bcfff0] el0_svc at ffff000010086784
PC: 0000ffff94c48ec4 LR: 0000aaaad1edba18 SP: 0000ffffc520a3e0
X29: 0000ffffc520a3e0 X28: 0000000000000000 X27: 0000000000000000
X26: 0000000000000002 X25: 0000000000000000 X24: 0000ffffc520a4b8
X23: 0000aaaae9b0a8a0 X22: 0000000000000000 X21: 0000000000000000
X20: 0000aaaad1ee5640 X19: 0000aaaae9b0a8f0 X18: 0000000000000000
X17: 0000ffff94c48ea0 X16: 0000aaaad1efcdb0 X15: 0000000000000040
X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000000000
X11: 0000000000000000 X10: 0000ffff94ce3ae0 X9: 0000000000000002
X8: 0000000000000111 X7: 0000000000000001 X6: 0000000000000001
X5: 0000000000000218 X4: 0000000000000000 X3: 0000000000000003
X2: 0000000000000000 X1: 0000aaaad1ee5640 X0: 0000000000000003
ORIG_X0: 0000000000000003 SYSCALLNO: 111 PSTATE: 40001000
这部分内容的打印和树上的描述有少许差别,打印出来的 PC 指针并不是 oops 驱动崩溃的位置,我尝试使用 dis -l 查看这个地址得到下面的结果:
crash> dis -l 0000ffff94c48ec4
dis: WARNING: ffff94c48ec4: no associated kernel symbol found
0xffff94c48ec4: Cannot access memory at address 0xffff94c48ec4
这个应该是什么原因呢?奇怪哦。
|