《奔跑吧Linux内核(第2版)卷2:调试与案例分析》- 在 bbb 上实践 kdump 调试问题
想着在 beaglebone black 上实践下第五章的 kdump 调试分析实践,没想到环境搭建出现各种幺蛾子。目前连 kdump 都没有部署成功,崩溃啊。。。
bbb 上安装了 debian bull
怎奈从仓库直接 apt 安装的 kdump-tools 总是提示无法内存,打印如下:
```
Started User Login Management.
[ 30.616656] kdump-tools: Starting kdump-tools:
[ 30.644866] kdump-tools: Creating symlink /var/lib/kdump/vmlinuz.
[ 30.663150] kdump-tools: Creating symlink /var/lib/kdump/initrd.img.
Finished Permit User Sessions.
Started Getty on tty1.
Finished BeagleBoard Generate Symlinks.
Starting OpenBSD Secure Shell server...
[ 32.020253] kdump-tools: Could not find a free area of memory of 0xad31c0 bytes...
[ 32.038656] kdump-tools: Cannot load /var/lib/kdump/vmlinuz
[ 32.095996] kdump-tools: failed to load kdump kernel ...
[ 32.114898] kdump-tools:failed!
```
但是为已经在 bootcmd 中追加了``crashkernel=384M-:128M``,并且 dmesg查看也有保留内存的动作:
```
root@beaglebone :~# dmesg | grep "crash" -B 3 -A 3
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 130560 pages, LIFO batch:31
[ 0.000000] CPU: All CPU(s) started in SVC mode.
[ 0.000000] Reserving 64MB of memory at 2432MB for crashkernel (System RAM: 510MB)
[ 0.000000] AM335X ES2.1 (sgx neon)
[ 0.000000] percpu: Embedded 21 pages/cpu s54604 r8192 d23220 u86016
[ 0.000000] pcpu-alloc: s54604 r8192 d23220 u86016 alloc=21*4096
[ 0.000000] pcpu-alloc: 0
[ 0.000000] Built 1 zonelists, mobility grouping on.Total pages: 129412
[ 0.000000] Kernel command line: console=ttyS0,115200n8 bone_capemgr.uboot_capemgr_enabled=1 root=/dev/mmcblk1p1 ro rootfstype=ext4 rootwait coherent_pool=1M net.ifnames=0 lpj=1990656 rng_core.default_quality=100 crashkernel=384M-:64M
[ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
--
[ 11.582844] with environment:
[ 11.582851] HOME=/
[ 11.582857] TERM=linux
[ 11.582863] crashkernel=384M-:64M
[ 18.204208] EXT4-fs (mmcblk1p1): mounted filesystem with ordered data mode. Opts: (null)
[ 18.383456] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.
[ 18.802579] systemd: systemd 247.3-7+deb11u4 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
```
无奈抱着试一试的心态重新编译新的 kexec-tools 工具,安装,问题就解决了。真的是柳暗花明又一村。
``` bash
root@BeagleBone:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x90000000
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.10.168+
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.10.168+
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="console=ttyS0,115200n8 bone_capemgr.uboot_capemgr_enabled=1 root=/dev/mmcblk1p1 ro rootfstype=ext4 rootwait coherent_pool=1M net.ifnames=0 lpj=1990656 rng_core.default_quality=100 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@BeagleBone:~# neofetch
_,met$$$$$gg. root@BeagleBone
,g$$$$$$$$$$$$$$$P. ---------------
,g$$P" """Y$$.". OS: Debian GNU/Linux 11 (bullseye) armv7l
,$$P' `$$$. Host: TI AM335x BeagleBone Black
',$$P ,ggs. `$$b: Kernel: 5.10.168+
`d$$' ,$P"' . $$$ Uptime: 1 min
$$P d$' , $$P Packages: 426 (dpkg)
$$: $$. - ,d$$' Shell: bash 5.1.4
$$; Y$b._ _,d$P' Terminal: /dev/ttyS0
Y$$. `.`"Y$$$$P"' CPU: Generic AM33XX (Flattened Device Tree) (1) @ 1.000GHz
`$$b "-.__ Memory: 42MiB / 355MiB
`Y$$
`Y$$.
`$$b.
`Y$$b.
`"Y$b._
`"""
root@BeagleBone:~# systemctl status kdump-tools
● kdump-tools.service - Kernel crash dump capture service
Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor p>
Active: active (exited) since Sun 2024-04-21 18:45:36 HKT; 1min 27s ago
Process: 595 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0>
Main PID: 595 (code=exited, status=0/SUCCESS)
CPU: 769ms
Apr 21 18:45:31 BeagleBone systemd: Starting Kernel crash dump capture servi>
Apr 21 18:45:34 BeagleBone kdump-tools: Starting kdump-tools:
Apr 21 18:45:34 BeagleBone kdump-tools: Creating symlink /var/lib/kdump/vm>
Apr 21 18:45:35 BeagleBone kdump-tools: Creating symlink /var/lib/kdump/in>
Apr 21 18:45:36 BeagleBone kdump-tools: loaded kdump kernel.
Apr 21 18:45:36 BeagleBone kdump-tools: /sbin/kexec -p --command-line="con>
Apr 21 18:45:36 BeagleBone kdump-tools: loaded kdump kernel
Apr 21 18:45:36 BeagleBone systemd: Finished Kernel crash dump capture servi>
```
从(https://benshushu.coding.net/public/runninglinuxkernel_5.0/runninglinuxkernel_5.0/git/files/rlk_5.0/kmodules/rlk_lab/rlk_senior/Volume_2/chapter_8/lab01/oops_test.c)这里下载书中的 oops 例程,编译加载。
但是捕获内核没有正常启动,没有看到捕获的 crash 文件,输出以下内容后就卡住了:
``` bash
[<c01105e8>] (machine_crash_shutdown) from [<c01fc9b4>] (__crash_kexec+0x6c/0xd8)
[<c01fc9b4>] (__crash_kexec) from [<c01fca7c>] (crash_kexec+0x5c/0x64)
[<c01fca7c>] (crash_kexec) from [<c010c57c>] (die+0x160/0x35c)
[<c010c57c>] (die) from [<c0115790>] (__do_kernel_fault.part.0+0x78/0x88)
[<c0115790>] (__do_kernel_fault.part.0) from [<c0ee6ffc>] (do_translation_fault+0x0/0xac)
[<c0ee6ffc>] (do_translation_fault) from [<60030013>] (0x60030013)
---[ end trace d664ec496f9aca0f ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1197 at drivers/gpio/gpiolib.c:3335 machine_crash_shutdown+0xa4/0xf4
Modules linked in: oops(O+) pru_rproc irq_pruss_intc pm33xx pruss ti_eqep counter c_can_platform c_can can_dev evdev wkup_m3_ipc uio_pdrv_genirq uio cpufreq_dt
CPU: 0 PID: 1197 Comm: insmod Kdump: loaded Tainted: G WO 5.10.168+ #5
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c0111134>] (unwind_backtrace) from [<c010c418>] (show_stack+0x10/0x14)
[<c010c418>] (show_stack) from [<c0ed5858>] (dump_stack+0x88/0x9c)
[<c0ed5858>] (dump_stack) from [<c013ca80>] (__warn+0x88/0x128)
[<c013ca80>] (__warn) from [<c0ecbf3c>] (warn_slowpath_fmt+0x64/0xc0)
[<c0ecbf3c>] (warn_slowpath_fmt) from [<c01105e8>] (machine_crash_shutdown+0xa4/0xf4)
[<c01105e8>] (machine_crash_shutdown) from [<c01fc9b4>] (__crash_kexec+0x6c/0xd8)
[<c01fc9b4>] (__crash_kexec) from [<c01fca7c>] (crash_kexec+0x5c/0x64)
[<c01fca7c>] (crash_kexec) from [<c010c57c>] (die+0x160/0x35c)
[<c010c57c>] (die) from [<c0115790>] (__do_kernel_fault.part.0+0x78/0x88)
[<c0115790>] (__do_kernel_fault.part.0) from [<c0ee6ffc>] (do_translation_fault+0x0/0xac)
[<c0ee6ffc>] (do_translation_fault) from [<60030013>] (0x60030013)
---[ end trace d664ec496f9aca10 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1197 at drivers/gpio/gpiolib.c:3335 machine_crash_shutdown+0xa4/0xf4
Modules linked in: oops(O+) pru_rproc irq_pruss_intc pm33xx pruss ti_eqep counter c_can_platform c_can can_dev evdev wkup_m3_ipc uio_pdrv_genirq uio cpufreq_dt
CPU: 0 PID: 1197 Comm: insmod Kdump: loaded Tainted: G WO 5.10.168+ #5
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c0111134>] (unwind_backtrace) from [<c010c418>] (show_stack+0x10/0x14)
[<c010c418>] (show_stack) from [<c0ed5858>] (dump_stack+0x88/0x9c)
[<c0ed5858>] (dump_stack) from [<c013ca80>] (__warn+0x88/0x128)
[<c013ca80>] (__warn) from [<c0ecbf3c>] (warn_slowpath_fmt+0x64/0xc0)
[<c0ecbf3c>] (warn_slowpath_fmt) from [<c01105e8>] (machine_crash_shutdown+0xa4/0xf4)
[<c01105e8>] (machine_crash_shutdown) from [<c01fc9b4>] (__crash_kexec+0x6c/0xd8)
[<c01fc9b4>] (__crash_kexec) from [<c01fca7c>] (crash_kexec+0x5c/0x64)
[<c01fca7c>] (crash_kexec) from [<c010c57c>] (die+0x160/0x35c)
[<c010c57c>] (die) from [<c0115790>] (__do_kernel_fault.part.0+0x78/0x88)
[<c0115790>] (__do_kernel_fault.part.0) from [<c0ee6ffc>] (do_translation_fault+0x0/0xac)
[<c0ee6ffc>] (do_translation_fault) from [<60030013>] (0x60030013)
---[ end trace d664ec496f9aca11 ]---
Loading crashdump kernel...
Bye!
```
奇怪啊。。。。
换用 ``echo c > /proc/sysrq-trigger`` 可以重启内核,但是重启后还是卡住崩溃:
```
[ 10.464551] df00: 0000013e c12d461c c1400560 c1211200 00000000 00000006 00000006 c18cc000
[ 10.472767] df20: 00000000 c114b0d8 c113eb14 c113eac8 0000013e c189e132 c189e136 473957b1
[ 10.480982] df40: 00000000 00000158 00000007 473957b1 c147defc 00000007 c189e000 c145d834
[ 10.489198] df60: 00000158 c14013cc 00000006 00000006 00000000 c1400560 c0edf0d4 c1400560
[ 10.497415] df80: 00000000 00000000 c0edf0d4 00000000 00000000 00000000 00000000 00000000
[ 10.505630] dfa0: 00000000 c0edf0dc 00000000 c0100148 00000000 00000000 00000000 00000000
[ 10.513845] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 10.522061] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 10.530302] [<c0ab4740>] (cpts_fifo_read) from [<c0abc3c0>] (cpsw_misc_interrupt+0x30/0x4c)
[ 10.538701] [<c0abc3c0>] (cpsw_misc_interrupt) from [<c01afba8>] (__handle_irq_event_percpu+0x5c/0x25c)
[ 10.548142] [<c01afba8>] (__handle_irq_event_percpu) from [<c01afe84>] (handle_irq_event+0x5c/0xc4)
[ 10.557235] [<c01afe84>] (handle_irq_event) from [<c01b4504>] (handle_level_irq+0xd0/0x1c0)
[ 10.565629] [<c01b4504>] (handle_level_irq) from [<c01af414>] (__handle_domain_irq+0xa0/0x10c)
[ 10.574284] [<c01af414>] (__handle_domain_irq) from [<c0100b8c>] (__irq_svc+0x6c/0xa8)
[ 10.582235] Exception stack(0xc18cdce0 to 0xc18cdd28)
[ 10.587310] dce0: c371be6c 600e0013 00000000 00000005 c371be00 00000000 c393b8c0 00000038
[ 10.595527] dd00: 600e0013 c371bed4 c371be6c c3933000 00000000 c18cdd30 c01b1cf8 c0ee641c
[ 10.603739] dd20: 600e0013 ffffffff
[ 10.607249] [<c0100b8c>] (__irq_svc) from [<c0ee641c>] (_raw_spin_unlock_irqrestore+0x20/0x54)
[ 10.615905] [<c0ee641c>] (_raw_spin_unlock_irqrestore) from [<c01b1cf8>] (__setup_irq+0x378/0x868)
[ 10.624908] [<c01b1cf8>] (__setup_irq) from [<c01b22cc>] (request_threaded_irq+0xe4/0x15c)
[ 10.633214] [<c01b22cc>] (request_threaded_irq) from [<c01b6118>] (devm_request_threaded_irq+0x64/0xb8)
[ 10.642654] [<c01b6118>] (devm_request_threaded_irq) from [<c0ab6ffc>] (cpsw_probe+0x8ac/0xd80)
[ 10.651402] [<c0ab6ffc>] (cpsw_probe) from [<c09e65cc>] (platform_drv_probe+0x48/0x9c)
[ 10.659360] [<c09e65cc>] (platform_drv_probe) from [<c09e3d0c>] (really_probe+0x108/0x514)
[ 10.667664] [<c09e3d0c>] (really_probe) from [<c09e44f0>] (driver_probe_device+0x78/0x1d4)
[ 10.675968] [<c09e44f0>] (driver_probe_device) from [<c09e4958>] (device_driver_attach+0xa8/0xb0)
[ 10.684883] [<c09e4958>] (device_driver_attach) from [<c09e4a50>] (__driver_attach+0xf0/0x15c)
[ 10.693545] [<c09e4a50>] (__driver_attach) from [<c09e18c8>] (bus_for_each_dev+0x78/0xb8)
[ 10.701762] [<c09e18c8>] (bus_for_each_dev) from [<c09e2ef0>] (bus_add_driver+0x110/0x214)
[ 10.710066] [<c09e2ef0>] (bus_add_driver) from [<c09e55a8>] (driver_register+0x8c/0x124)
[ 10.718198] [<c09e55a8>] (driver_register) from [<c01021a4>] (do_one_initcall+0x50/0x2b8)
[ 10.726420] [<c01021a4>] (do_one_initcall) from [<c14013cc>] (kernel_init_freeable+0x248/0x2a8)
[ 10.735164] [<c14013cc>] (kernel_init_freeable) from [<c0edf0dc>] (kernel_init+0x8/0x11c)
[ 10.743381] [<c0edf0dc>] (kernel_init) from [<c0100148>] (ret_from_fork+0x14/0x2c)
[ 10.750982] Exception stack(0xc18cdfb0 to 0xc18cdff8)
[ 10.756056] dfa0: 00000000 00000000 00000000 00000000
[ 10.764272] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 10.772487] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 10.779133] Code: e2857090 e58d300c e1a00007 e59d3018 (e584300c)
[ 10.785278] ---[ end trace 3f1de6e522b74bbc ]---
[ 10.789918] Kernel panic - not syncing: Fatal exception in interrupt
[ 10.796309] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
```
模拟和实际还是有差距啊。 <p>进一步地,我在 beaglebone black 上发了一个问题贴,[<a class="fancy-title" href="https://forum.beagleboard.org/t/enable-kdump-service-on-beaglebone-black-but-after-reboot-still-get-kernel-panic/38251">enable kdump service on beaglebone black, but after reboot still get kernel panic</a>](https://forum.beagleboard.org/t/enable-kdump-service-on-beaglebone-black-but-after-reboot-still-get-kernel-panic/38251),期待有好的消息。。</p>
<p>cmdline的保留内存分配小一点</p>
LitchiCheng 发表于 2024-4-22 10:51
cmdline的保留内存分配小一点
<p>尝试了一下到 64MB 还是不行,一样的错误。</p>
iysheng 发表于 2024-4-22 20:26
尝试了一下到 64MB 还是不行,一样的错误。
<p>nr_cpu改成1</p>
<p>晚上回来试一下</p><br/> <div class='shownolgin' data-isdigest='no'><p>环境部署好了,你就成功60%了</p>
</div><script>showreplylogin();</script><script type="text/javascript">(function(d,c){var a=d.createElement("script"),m=d.getElementsByTagName("script"),eewurl="//counter.eeworld.com.cn/pv/count/";a.src=eewurl+c;m.parentNode.insertBefore(a,m)})(document,523)</script> <div class='shownolgin' data-isdigest='no'><p></p>
<p>添加了 nr_cpus=1 到 bootcmd,还是不行。</p><br/></div><script>showreplylogin();</script>
页:
[1]