V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
Distributions
Ubuntu
Fedora
CentOS
中文资源站
网易开源镜像站
jiayouniu
V2EX  ›  Linux

求助 Linux 内核异常,帮忙分析一下异常日志

  •  
  •   jiayouniu · 2022-01-26 21:32:21 +08:00 · 2803 次点击
    这是一个创建于 1036 天前的主题,其中的信息可能已经有所发展或是发生改变。

    系统是 openmediavault 5 是虚拟机,内核是 Debian 5.10.70-1~bpo10 ,宿主机是 pve 7 ,宿主机安装在 nvme 硬盘上。

    虚拟机系统启动后,1 、2 天就会出现类似的问题 kswapd0 Not tainted ,但是 stack 不太一样

    Jan 25 22:02:16 omv5 kernel: [78356.750086] general protection fault, probably for non-canonical address 0x100000000000000: 0000 [#1] SMP PTI
    Jan 25 22:02:16 omv5 kernel: [78356.750165] CPU: 0 PID: 68 Comm: kswapd0 Not tainted 5.10.0-0.bpo.9-amd64 #1 Debian 5.10.70-1~bpo10+1
    Jan 25 22:02:16 omv5 kernel: [78356.750224] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Jan 25 22:02:16 omv5 kernel: [78356.750316] RIP: 0010:fsverity_free_info.part.3+0x9/0x30
    Jan 25 22:02:16 omv5 kernel: [78356.750365] Code: ff ff c3 48 8b 52 40 48 c7 c6 d8 75 90 ba 48 c7 c7 98 5c 05 bb e8 07 ce 15 00 b8 ff ff ff ff c3 90 0f 1f 44 00 00 53 48 89 fb <48> 8b 7f 08 e8 de d2 f4 ff 48 8b 3d 97 8f c8 01 48 89 de 5b e9 ce
    Jan 25 22:02:16 omv5 kernel: [78356.750481] RSP: 0018:ffffacd6c01a7b28 EFLAGS: 00010206
    Jan 25 22:02:16 omv5 kernel: [78356.750523] RAX: ffff94d90b2f2038 RBX: 0100000000000000 RCX: 0000000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.750576] RDX: 00000000fffffffe RSI: 0000000000000000 RDI: 0100000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.750626] RBP: ffff94d90b2f1e78 R08: 0000000000000000 R09: 0000000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.750682] R10: ffff94d90b6d50c0 R11: 0000000000000001 R12: ffffffffc0762f80
    Jan 25 22:02:16 omv5 kernel: [78356.750730] R13: ffff94d935214000 R14: 0000000000000000 R15: 00000000000002a9
    Jan 25 22:02:16 omv5 kernel: [78356.750779] FS:  0000000000000000(0000) GS:ffff94d97dc00000(0000) knlGS:0000000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.750827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 22:02:16 omv5 kernel: [78356.750867] CR2: 00007f131a86d000 CR3: 00000000086ae000 CR4: 00000000000006f0
    Jan 25 22:02:16 omv5 kernel: [78356.750913] Call Trace:
    Jan 25 22:02:16 omv5 kernel: [78356.750953]  fsverity_cleanup_inode+0x1a/0x30
    Jan 25 22:02:16 omv5 kernel: [78356.751074]  ext4_evict_inode+0x7a/0x640 [ext4]
    Jan 25 22:02:16 omv5 kernel: [78356.751124]  evict+0xd2/0x1a0
    Jan 25 22:02:16 omv5 kernel: [78356.751161]  dispose_list+0x48/0x60
    Jan 25 22:02:16 omv5 kernel: [78356.751198]  prune_icache_sb+0x52/0x70
    Jan 25 22:02:16 omv5 kernel: [78356.751236]  super_cache_scan+0x123/0x1a0
    Jan 25 22:02:16 omv5 kernel: [78356.751276]  do_shrink_slab+0x11f/0x250
    Jan 25 22:02:16 omv5 kernel: [78356.751313]  shrink_slab+0x20f/0x2c0
    Jan 25 22:02:16 omv5 kernel: [78356.751352]  shrink_node+0x24b/0x6d0
    Jan 25 22:02:16 omv5 kernel: [78356.751382]  balance_pgdat+0x2d1/0x550
    Jan 25 22:02:16 omv5 kernel: [78356.752424]  kswapd+0x201/0x390
    Jan 25 22:02:16 omv5 kernel: [78356.753231]  ? finish_wait+0x80/0x80
    Jan 25 22:02:16 omv5 kernel: [78356.753946]  ? balance_pgdat+0x550/0x550
    Jan 25 22:02:16 omv5 kernel: [78356.754706]  kthread+0x116/0x130
    Jan 25 22:02:16 omv5 kernel: [78356.755452]  ? __kthread_cancel_work+0x40/0x40
    Jan 25 22:02:16 omv5 kernel: [78356.756248]  ret_from_fork+0x22/0x30
    Jan 25 22:02:16 omv5 kernel: [78356.757030] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink br_netfilter bridge stp llc overlay softdog watchdog cpufreq_conservative cpufreq_userspace cpufreq_ondemand cpufreq_powersave bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec pcspkr evdev serio_raw drm virtio_console joydev sg virtio_balloon qemu_fw_cfg button sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod t10_pi crc_t10dif crct10dif_generic sr_mod crct10dif_common cdrom ata_generic hid_generic usbhid hid virtio_net net_failover failover virtio_scsi psmouse ahci libahci ata_piix uhci_hcd libata ehci_hcd usbcore virtio_pci scsi_mod virtio_ring virtio
    Jan 25 22:02:16 omv5 kernel: [78356.757137]  i2c_piix4 usb_common
    Jan 25 22:02:16 omv5 kernel: [78356.765162] ---[ end trace 6b6382944e461dee ]---
    Jan 25 22:02:16 omv5 kernel: [78356.766097] RIP: 0010:fsverity_free_info.part.3+0x9/0x30
    Jan 25 22:02:16 omv5 kernel: [78356.767038] Code: ff ff c3 48 8b 52 40 48 c7 c6 d8 75 90 ba 48 c7 c7 98 5c 05 bb e8 07 ce 15 00 b8 ff ff ff ff c3 90 0f 1f 44 00 00 53 48 89 fb <48> 8b 7f 08 e8 de d2 f4 ff 48 8b 3d 97 8f c8 01 48 89 de 5b e9 ce
    Jan 25 22:02:16 omv5 kernel: [78356.769021] RSP: 0018:ffffacd6c01a7b28 EFLAGS: 00010206
    Jan 25 22:02:16 omv5 kernel: [78356.770001] RAX: ffff94d90b2f2038 RBX: 0100000000000000 RCX: 0000000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.771038] RDX: 00000000fffffffe RSI: 0000000000000000 RDI: 0100000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.771959] RBP: ffff94d90b2f1e78 R08: 0000000000000000 R09: 0000000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.772897] R10: ffff94d90b6d50c0 R11: 0000000000000001 R12: ffffffffc0762f80
    Jan 25 22:02:16 omv5 kernel: [78356.773755] R13: ffff94d935214000 R14: 0000000000000000 R15: 00000000000002a9
    Jan 25 22:02:16 omv5 kernel: [78356.774678] FS:  0000000000000000(0000) GS:ffff94d97dc00000(0000) knlGS:0000000000000000
    Jan 25 22:02:16 omv5 kernel: [78356.775346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 22:02:16 omv5 kernel: [78356.775879] CR2: 00007f131a86d000 CR3: 00000000086ae000 CR4: 00000000000006f0
    

    尝试安装 omv6, 也出现了同样的问题。目前猜测是宿主机的问题,但是不确定是系统问题还是硬件(内存、硬盘)的问题。 折腾好几天了,网上的信息也看不少,现在也没什么思路了。 拜托各位大佬帮忙看看🙏

    liuxu
        1
    liuxu  
       2022-01-26 21:56:16 +08:00   ❤️ 1
    看了下,内核 bug 你需要贴给 debian bug report ,https://www.debian.org/Bugs/Reporting

    我查了下和你差不多的 bug ,都是让升级内核到 5.11-5.13 以上,建议更高试试
    jiayouniu
        2
    jiayouniu  
    OP
       2022-01-27 01:06:44 +08:00 via iPhone
    @liuxu 装了 omv6 ,内核是 5.15 ,同样还是出现类似的问题。我现在怀疑是内存的问题,准备跑下 memory test 。如果还是不行,再提交 bug
    liuxu
        3
    liuxu  
       2022-01-27 12:17:45 +08:00   ❤️ 1
    @jiayouniu

    从你贴的内核调用栈看,kswapd 是 swap 交换进程,作用是把内存的的 cache 缓存到 swap 。
    它在执行了 shrink_slab ,slab 是内核内存管理层,linux 内核在获取物理内存后,使用结束不会释放,而是自己管理,放在 xxx_slab 链中

    然后内部调用了 prune_icache_sb ,icache_sb 应该是磁盘的 superblock ,superblock 存放着文件系统统计信息,而它是缓存在内存中的,这里执行 prune_icache_sb ,也就是刷写 superbolck 到磁盘上

    之后执行了 fsverity_cleanup_inode ,也就是把内存中的 inode 缓存也全部写回磁盘
    https://elixir.bootlin.com/linux/v5.10.70/source/fs/verity/open.c#L346

    void fsverity_cleanup_inode(struct inode *inode)
    {
    fsverity_free_info(inode->i_verity_info);
    inode->i_verity_info = NULL;
    }
    EXPORT_SYMBOL_GPL(fsverity_cleanup_inode)


    最后这个函数最终调用 fsverity_free_info ,抛出了异常
    https://elixir.bootlin.com/linux/v5.10.70/source/fs/verity/open.c#L240

    void fsverity_free_info(struct fsverity_info *vi)
    {
    if (!vi)
    return;
    kfree(vi->tree_params.hashstate);
    kmem_cache_free(fsverity_info_cachep, vi);
    }

    结果这个时候内核抛了异常,general protection fault, probably for non-canonical address

    大致可以猜到,此时的 inode->i_verity_info 地址已经被污染了,不再是有效内存地址了

    我猜测可能是所谓的 SMP 多核 cpu 执行清除的时候没有对此数据结构锁上,导致其他 cpu 核心已经清除了它,地址已经置 NULL ,然后此时 cpu0 执行清除此数据导致 free 了 0x100000000000000: 0000 非法地址(此地址在此内存架构上可能是 c 语言的 NULL ?)

    以上只是我不专业的推测,具体还是看你操作是否有效,希望解决问题后能 at 我,让我看看具体是啥问题,咋解决的
    liuxu
        4
    liuxu  
       2022-01-27 12:19:36 +08:00
    @liuxu 后面 2 个函数调用不涉及写回磁盘,只涉及写完后释放内存中的 inode 缓存
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5462 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 34ms · UTC 07:22 · PVG 15:22 · LAX 23:22 · JFK 02:22
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.