iysheng 发表于 2024-4-21 19:12

《奔跑吧Linux内核(第2版)卷2:调试与案例分析》- 在 bbb 上实践 kdump 调试问题





想着在 beaglebone black 上实践下第五章的 kdump 调试分析实践,没想到环境搭建出现各种幺蛾子。目前连 kdump 都没有部署成功,崩溃啊。。。

bbb 上安装了 debian bull

怎奈从仓库直接 apt 安装的 kdump-tools 总是提示无法内存,打印如下:

```

Started User Login Management.
[   30.616656] kdump-tools: Starting kdump-tools:
[   30.644866] kdump-tools: Creating symlink /var/lib/kdump/vmlinuz.
[   30.663150] kdump-tools: Creating symlink /var/lib/kdump/initrd.img.
Finished Permit User Sessions.
Started Getty on tty1.
Finished BeagleBoard Generate Symlinks.
         Starting OpenBSD Secure Shell server...
[   32.020253] kdump-tools: Could not find a free area of memory of 0xad31c0 bytes...
[   32.038656] kdump-tools: Cannot load /var/lib/kdump/vmlinuz
[   32.095996] kdump-tools: failed to load kdump kernel ...
[   32.114898] kdump-tools:failed!

```

但是为已经在 bootcmd 中追加了``crashkernel=384M-:128M``,并且 dmesg查看也有保留内存的动作:
```
root@beaglebone :~# dmesg | grep "crash" -B 3 -A 3
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 130560 pages, LIFO batch:31
[    0.000000] CPU: All CPU(s) started in SVC mode.
[    0.000000] Reserving 64MB of memory at 2432MB for crashkernel (System RAM: 510MB)
[    0.000000] AM335X ES2.1 (sgx neon)
[    0.000000] percpu: Embedded 21 pages/cpu s54604 r8192 d23220 u86016
[    0.000000] pcpu-alloc: s54604 r8192 d23220 u86016 alloc=21*4096
[    0.000000] pcpu-alloc: 0
[    0.000000] Built 1 zonelists, mobility grouping on.Total pages: 129412
[    0.000000] Kernel command line: console=ttyS0,115200n8 bone_capemgr.uboot_capemgr_enabled=1 root=/dev/mmcblk1p1 ro rootfstype=ext4 rootwait coherent_pool=1M net.ifnames=0 lpj=1990656 rng_core.default_quality=100 crashkernel=384M-:64M
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
--
[   11.582844]   with environment:
[   11.582851]   HOME=/
[   11.582857]   TERM=linux
[   11.582863]   crashkernel=384M-:64M
[   18.204208] EXT4-fs (mmcblk1p1): mounted filesystem with ordered data mode. Opts: (null)
[   18.383456] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.
[   18.802579] systemd: systemd 247.3-7+deb11u4 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
```
无奈抱着试一试的心态重新编译新的 kexec-tools 工具,安装,问题就解决了。真的是柳暗花明又一村。
``` bash
root@BeagleBone:~# kdump-config show
DUMP_MODE:            kdump
USE_KDUMP:            1
KDUMP_COREDIR:          /var/crash
crashkernel addr: 0x90000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.10.168+
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.10.168+
current state:    ready to kdump

kexec command:
/sbin/kexec -p --command-line="console=ttyS0,115200n8 bone_capemgr.uboot_capemgr_enabled=1 root=/dev/mmcblk1p1 ro rootfstype=ext4 rootwait coherent_pool=1M net.ifnames=0 lpj=1990656 rng_core.default_quality=100 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@BeagleBone:~# neofetch
       _,met$$$$$gg.          root@BeagleBone
    ,g$$$$$$$$$$$$$$$P.       ---------------
,g$$P"   """Y$$.".      OS: Debian GNU/Linux 11 (bullseye) armv7l
,$$P'            `$$$.   Host: TI AM335x BeagleBone Black
',$$P       ,ggs.   `$$b:   Kernel: 5.10.168+
`d$$'   ,$P"'   .    $$$    Uptime: 1 min
$$P      d$'   ,    $$P    Packages: 426 (dpkg)
$$:      $$.   -    ,d$$'    Shell: bash 5.1.4
$$;      Y$b._   _,d$P'      Terminal: /dev/ttyS0
Y$$.    `.`"Y$$$$P"'         CPU: Generic AM33XX (Flattened Device Tree) (1) @ 1.000GHz
`$$b      "-.__            Memory: 42MiB / 355MiB
`Y$$
   `Y$$.
   `$$b.
       `Y$$b.
          `"Y$b._
            `"""
root@BeagleBone:~# systemctl status kdump-tools
● kdump-tools.service - Kernel crash dump capture service
   Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled; vendor p>
   Active: active (exited) since Sun 2024-04-21 18:45:36 HKT; 1min 27s ago
    Process: 595 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0>
   Main PID: 595 (code=exited, status=0/SUCCESS)
      CPU: 769ms

Apr 21 18:45:31 BeagleBone systemd: Starting Kernel crash dump capture servi>
Apr 21 18:45:34 BeagleBone kdump-tools: Starting kdump-tools:
Apr 21 18:45:34 BeagleBone kdump-tools: Creating symlink /var/lib/kdump/vm>
Apr 21 18:45:35 BeagleBone kdump-tools: Creating symlink /var/lib/kdump/in>
Apr 21 18:45:36 BeagleBone kdump-tools: loaded kdump kernel.
Apr 21 18:45:36 BeagleBone kdump-tools: /sbin/kexec -p --command-line="con>
Apr 21 18:45:36 BeagleBone kdump-tools: loaded kdump kernel
Apr 21 18:45:36 BeagleBone systemd: Finished Kernel crash dump capture servi>
```
从(https://benshushu.coding.net/public/runninglinuxkernel_5.0/runninglinuxkernel_5.0/git/files/rlk_5.0/kmodules/rlk_lab/rlk_senior/Volume_2/chapter_8/lab01/oops_test.c)这里下载书中的 oops 例程,编译加载。
但是捕获内核没有正常启动,没有看到捕获的 crash 文件,输出以下内容后就卡住了:
``` bash
[<c01105e8>] (machine_crash_shutdown) from [<c01fc9b4>] (__crash_kexec+0x6c/0xd8)
[<c01fc9b4>] (__crash_kexec) from [<c01fca7c>] (crash_kexec+0x5c/0x64)
[<c01fca7c>] (crash_kexec) from [<c010c57c>] (die+0x160/0x35c)
[<c010c57c>] (die) from [<c0115790>] (__do_kernel_fault.part.0+0x78/0x88)
[<c0115790>] (__do_kernel_fault.part.0) from [<c0ee6ffc>] (do_translation_fault+0x0/0xac)
[<c0ee6ffc>] (do_translation_fault) from [<60030013>] (0x60030013)
---[ end trace d664ec496f9aca0f ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1197 at drivers/gpio/gpiolib.c:3335 machine_crash_shutdown+0xa4/0xf4
Modules linked in: oops(O+) pru_rproc irq_pruss_intc pm33xx pruss ti_eqep counter c_can_platform c_can can_dev evdev wkup_m3_ipc uio_pdrv_genirq uio cpufreq_dt
CPU: 0 PID: 1197 Comm: insmod Kdump: loaded Tainted: G      WO      5.10.168+ #5
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c0111134>] (unwind_backtrace) from [<c010c418>] (show_stack+0x10/0x14)
[<c010c418>] (show_stack) from [<c0ed5858>] (dump_stack+0x88/0x9c)
[<c0ed5858>] (dump_stack) from [<c013ca80>] (__warn+0x88/0x128)
[<c013ca80>] (__warn) from [<c0ecbf3c>] (warn_slowpath_fmt+0x64/0xc0)
[<c0ecbf3c>] (warn_slowpath_fmt) from [<c01105e8>] (machine_crash_shutdown+0xa4/0xf4)
[<c01105e8>] (machine_crash_shutdown) from [<c01fc9b4>] (__crash_kexec+0x6c/0xd8)
[<c01fc9b4>] (__crash_kexec) from [<c01fca7c>] (crash_kexec+0x5c/0x64)
[<c01fca7c>] (crash_kexec) from [<c010c57c>] (die+0x160/0x35c)
[<c010c57c>] (die) from [<c0115790>] (__do_kernel_fault.part.0+0x78/0x88)
[<c0115790>] (__do_kernel_fault.part.0) from [<c0ee6ffc>] (do_translation_fault+0x0/0xac)
[<c0ee6ffc>] (do_translation_fault) from [<60030013>] (0x60030013)
---[ end trace d664ec496f9aca10 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1197 at drivers/gpio/gpiolib.c:3335 machine_crash_shutdown+0xa4/0xf4
Modules linked in: oops(O+) pru_rproc irq_pruss_intc pm33xx pruss ti_eqep counter c_can_platform c_can can_dev evdev wkup_m3_ipc uio_pdrv_genirq uio cpufreq_dt
CPU: 0 PID: 1197 Comm: insmod Kdump: loaded Tainted: G      WO      5.10.168+ #5
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c0111134>] (unwind_backtrace) from [<c010c418>] (show_stack+0x10/0x14)
[<c010c418>] (show_stack) from [<c0ed5858>] (dump_stack+0x88/0x9c)
[<c0ed5858>] (dump_stack) from [<c013ca80>] (__warn+0x88/0x128)
[<c013ca80>] (__warn) from [<c0ecbf3c>] (warn_slowpath_fmt+0x64/0xc0)
[<c0ecbf3c>] (warn_slowpath_fmt) from [<c01105e8>] (machine_crash_shutdown+0xa4/0xf4)
[<c01105e8>] (machine_crash_shutdown) from [<c01fc9b4>] (__crash_kexec+0x6c/0xd8)
[<c01fc9b4>] (__crash_kexec) from [<c01fca7c>] (crash_kexec+0x5c/0x64)
[<c01fca7c>] (crash_kexec) from [<c010c57c>] (die+0x160/0x35c)
[<c010c57c>] (die) from [<c0115790>] (__do_kernel_fault.part.0+0x78/0x88)
[<c0115790>] (__do_kernel_fault.part.0) from [<c0ee6ffc>] (do_translation_fault+0x0/0xac)
[<c0ee6ffc>] (do_translation_fault) from [<60030013>] (0x60030013)
---[ end trace d664ec496f9aca11 ]---
Loading crashdump kernel...
Bye!
```
奇怪啊。。。。
换用 ``echo c > /proc/sysrq-trigger`` 可以重启内核,但是重启后还是卡住崩溃:
```
[   10.464551] df00: 0000013e c12d461c c1400560 c1211200 00000000 00000006 00000006 c18cc000                                                                        
[   10.472767] df20: 00000000 c114b0d8 c113eb14 c113eac8 0000013e c189e132 c189e136 473957b1
[   10.480982] df40: 00000000 00000158 00000007 473957b1 c147defc 00000007 c189e000 c145d834
[   10.489198] df60: 00000158 c14013cc 00000006 00000006 00000000 c1400560 c0edf0d4 c1400560
[   10.497415] df80: 00000000 00000000 c0edf0d4 00000000 00000000 00000000 00000000 00000000
[   10.505630] dfa0: 00000000 c0edf0dc 00000000 c0100148 00000000 00000000 00000000 00000000
[   10.513845] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   10.522061] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[   10.530302] [<c0ab4740>] (cpts_fifo_read) from [<c0abc3c0>] (cpsw_misc_interrupt+0x30/0x4c)
[   10.538701] [<c0abc3c0>] (cpsw_misc_interrupt) from [<c01afba8>] (__handle_irq_event_percpu+0x5c/0x25c)
[   10.548142] [<c01afba8>] (__handle_irq_event_percpu) from [<c01afe84>] (handle_irq_event+0x5c/0xc4)
[   10.557235] [<c01afe84>] (handle_irq_event) from [<c01b4504>] (handle_level_irq+0xd0/0x1c0)
[   10.565629] [<c01b4504>] (handle_level_irq) from [<c01af414>] (__handle_domain_irq+0xa0/0x10c)
[   10.574284] [<c01af414>] (__handle_domain_irq) from [<c0100b8c>] (__irq_svc+0x6c/0xa8)
[   10.582235] Exception stack(0xc18cdce0 to 0xc18cdd28)
[   10.587310] dce0: c371be6c 600e0013 00000000 00000005 c371be00 00000000 c393b8c0 00000038
[   10.595527] dd00: 600e0013 c371bed4 c371be6c c3933000 00000000 c18cdd30 c01b1cf8 c0ee641c
[   10.603739] dd20: 600e0013 ffffffff
[   10.607249] [<c0100b8c>] (__irq_svc) from [<c0ee641c>] (_raw_spin_unlock_irqrestore+0x20/0x54)
[   10.615905] [<c0ee641c>] (_raw_spin_unlock_irqrestore) from [<c01b1cf8>] (__setup_irq+0x378/0x868)
[   10.624908] [<c01b1cf8>] (__setup_irq) from [<c01b22cc>] (request_threaded_irq+0xe4/0x15c)
[   10.633214] [<c01b22cc>] (request_threaded_irq) from [<c01b6118>] (devm_request_threaded_irq+0x64/0xb8)
[   10.642654] [<c01b6118>] (devm_request_threaded_irq) from [<c0ab6ffc>] (cpsw_probe+0x8ac/0xd80)
[   10.651402] [<c0ab6ffc>] (cpsw_probe) from [<c09e65cc>] (platform_drv_probe+0x48/0x9c)
[   10.659360] [<c09e65cc>] (platform_drv_probe) from [<c09e3d0c>] (really_probe+0x108/0x514)
[   10.667664] [<c09e3d0c>] (really_probe) from [<c09e44f0>] (driver_probe_device+0x78/0x1d4)
[   10.675968] [<c09e44f0>] (driver_probe_device) from [<c09e4958>] (device_driver_attach+0xa8/0xb0)
[   10.684883] [<c09e4958>] (device_driver_attach) from [<c09e4a50>] (__driver_attach+0xf0/0x15c)
[   10.693545] [<c09e4a50>] (__driver_attach) from [<c09e18c8>] (bus_for_each_dev+0x78/0xb8)
[   10.701762] [<c09e18c8>] (bus_for_each_dev) from [<c09e2ef0>] (bus_add_driver+0x110/0x214)
[   10.710066] [<c09e2ef0>] (bus_add_driver) from [<c09e55a8>] (driver_register+0x8c/0x124)
[   10.718198] [<c09e55a8>] (driver_register) from [<c01021a4>] (do_one_initcall+0x50/0x2b8)
[   10.726420] [<c01021a4>] (do_one_initcall) from [<c14013cc>] (kernel_init_freeable+0x248/0x2a8)
[   10.735164] [<c14013cc>] (kernel_init_freeable) from [<c0edf0dc>] (kernel_init+0x8/0x11c)
[   10.743381] [<c0edf0dc>] (kernel_init) from [<c0100148>] (ret_from_fork+0x14/0x2c)
[   10.750982] Exception stack(0xc18cdfb0 to 0xc18cdff8)
[   10.756056] dfa0:                                     00000000 00000000 00000000 00000000
[   10.764272] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   10.772487] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   10.779133] Code: e2857090 e58d300c e1a00007 e59d3018 (e584300c)
[   10.785278] ---[ end trace 3f1de6e522b74bbc ]---
[   10.789918] Kernel panic - not syncing: Fatal exception in interrupt
[   10.796309] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
```
模拟和实际还是有差距啊。

iysheng 发表于 2024-4-21 20:00

<p>进一步地,我在 beaglebone black 上发了一个问题贴,[<a class="fancy-title" href="https://forum.beagleboard.org/t/enable-kdump-service-on-beaglebone-black-but-after-reboot-still-get-kernel-panic/38251">enable kdump service on beaglebone black, but after reboot still get kernel panic</a>](https://forum.beagleboard.org/t/enable-kdump-service-on-beaglebone-black-but-after-reboot-still-get-kernel-panic/38251),期待有好的消息。。</p>

LitchiCheng 发表于 2024-4-22 10:51

<p>cmdline的保留内存分配小一点</p>

iysheng 发表于 2024-4-22 20:26

LitchiCheng 发表于 2024-4-22 10:51
cmdline的保留内存分配小一点

<p>尝试了一下到 64MB 还是不行,一样的错误。</p>

LitchiCheng 发表于 2024-4-23 20:58

iysheng 发表于 2024-4-22 20:26
尝试了一下到 64MB 还是不行,一样的错误。

<p>nr_cpu改成1</p>

iysheng 发表于 2024-4-25 07:36

<p>晚上回来试一下</p><br/>

freebsder 发表于 2024-4-25 09:06

<div class='shownolgin' data-isdigest='no'><p>环境部署好了,你就成功60%了</p>
</div><script>showreplylogin();</script><script type="text/javascript">(function(d,c){var a=d.createElement("script"),m=d.getElementsByTagName("script"),eewurl="//counter.eeworld.com.cn/pv/count/";a.src=eewurl+c;m.parentNode.insertBefore(a,m)})(document,523)</script>

iysheng 发表于 2024-4-26 06:43

<div class='shownolgin' data-isdigest='no'><p></p>


<p>添加了&nbsp;nr_cpus=1&nbsp;到&nbsp;bootcmd,还是不行。</p><br/></div><script>showreplylogin();</script>

通途科技 发表于 2024-10-31 11:27

页: [1]
查看完整版本: 《奔跑吧Linux内核(第2版)卷2:调试与案例分析》- 在 bbb 上实践 kdump 调试问题