Hi!
I had a home lab with 2 pools. One pool RAIDZ for some work-related backups and one pool with 1 cheap SMR for personal backups. I had some running kubernetes pods and virtual machines, they had virtual disks on my SMR drive. System is installed on a separate SSD drive.
Recently I've added new pool with mirrored SSD devices and decided to move there personal backups, kubernetes and VMs. I did it with replication tasks - cloned backup datasets and VM's zvols to new pool. Everything seemingly worked fine.
But after I rebooted (I wanted to be sure everyithing starts up fine) mounting pools took awfuly long time (maybe hours, after waiting few hours just went to sleep). VM didn't start automatically, kubernetes didn't run, CPU, disks etc charts in Reporting and Dashboard are missing.
Not sure were to look, even didn't find any useful logs. The only strange error message in dmesg is:
I rebooted multiple times and this persists. After sshd goes online during boot sequence I see that zpools are not ready and not mounted for a long time. After they are finally mounted, I manually boot VM and it works fine. No kube and no charts though.
I can live with it, of course, charts are not so important and reboots are rare, but it troubles me that seemingly trivial operation like adding new pool and replicating datasets to it broke something.
Where can I look for problems? Smartctl and hdsentinel show that disks are good. Maybe some specific logs of kubernetes?
I had a home lab with 2 pools. One pool RAIDZ for some work-related backups and one pool with 1 cheap SMR for personal backups. I had some running kubernetes pods and virtual machines, they had virtual disks on my SMR drive. System is installed on a separate SSD drive.
Recently I've added new pool with mirrored SSD devices and decided to move there personal backups, kubernetes and VMs. I did it with replication tasks - cloned backup datasets and VM's zvols to new pool. Everything seemingly worked fine.
But after I rebooted (I wanted to be sure everyithing starts up fine) mounting pools took awfuly long time (maybe hours, after waiting few hours just went to sleep). VM didn't start automatically, kubernetes didn't run, CPU, disks etc charts in Reporting and Dashboard are missing.
Not sure were to look, even didn't find any useful logs. The only strange error message in dmesg is:
Code:
[Mon Jan 15 22:24:34 2024] INFO: task middlewared (wo:1748 blocked for more than 120 seconds. [Mon Jan 15 22:24:34 2024] Tainted: P OE 6.1.55-production+truenas #2 [Mon Jan 15 22:24:34 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Mon Jan 15 22:24:34 2024] task:middlewared (wo state:D stack:0 pid:1748 ppid:1681 flags:0x00000002 [Mon Jan 15 22:24:34 2024] Call Trace: [Mon Jan 15 22:24:34 2024] <TASK> [Mon Jan 15 22:24:34 2024] __schedule+0x2ed/0x860 [Mon Jan 15 22:24:34 2024] schedule+0x5a/0xb0 [Mon Jan 15 22:24:34 2024] schedule_preempt_disabled+0x14/0x30 [Mon Jan 15 22:24:34 2024] __mutex_lock.constprop.0+0x3b4/0x700 [Mon Jan 15 22:24:34 2024] spa_open_common+0x65/0x440 [zfs] [Mon Jan 15 22:24:34 2024] spa_get_stats+0x4a/0x210 [zfs] [Mon Jan 15 22:24:34 2024] ? spl_kmem_alloc_impl+0x87/0xd0 [spl] [Mon Jan 15 22:24:34 2024] zfs_ioc_pool_stats+0x3c/0x90 [zfs] [Mon Jan 15 22:24:34 2024] zfsdev_ioctl_common+0x67f/0x770 [zfs] [Mon Jan 15 22:24:34 2024] ? __kmalloc_node+0xbf/0x150 [Mon Jan 15 22:24:34 2024] zfsdev_ioctl+0x4f/0xd0 [zfs] [Mon Jan 15 22:24:34 2024] __x64_sys_ioctl+0x90/0xd0 [Mon Jan 15 22:24:34 2024] do_syscall_64+0x5b/0xc0 [Mon Jan 15 22:24:34 2024] ? handle_mm_fault+0xdb/0x2d0 [Mon Jan 15 22:24:34 2024] ? preempt_count_add+0x47/0xa0 [Mon Jan 15 22:24:34 2024] ? up_read+0x37/0x70 [Mon Jan 15 22:24:34 2024] ? do_user_addr_fault+0x1bb/0x570 [Mon Jan 15 22:24:34 2024] ? exc_page_fault+0x70/0x170 [Mon Jan 15 22:24:34 2024] entry_SYSCALL_64_after_hwframe+0x64/0xce [Mon Jan 15 22:24:34 2024] RIP: 0033:0x7f1d03928afb [Mon Jan 15 22:24:34 2024] RSP: 002b:00007ffe61da17f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [Mon Jan 15 22:24:34 2024] RAX: ffffffffffffffda RBX: 0000000002ee3d10 RCX: 00007f1d03928afb [Mon Jan 15 22:24:34 2024] RDX: 00007ffe61da1870 RSI: 0000000000005a05 RDI: 0000000000000019 [Mon Jan 15 22:24:34 2024] RBP: 00007ffe61da4e60 R08: 0000000000000007 R09: 0000000000000013 [Mon Jan 15 22:24:34 2024] R10: 000000000150b010 R11: 0000000000000246 R12: 00007ffe61da1870 [Mon Jan 15 22:24:34 2024] R13: 0000000002ee3d10 R14: 000000000433da20 R15: 00007ffe61da4e74 [Mon Jan 15 22:24:34 2024] </TASK> [Mon Jan 15 22:24:34 2024] INFO: task middlewared (wo:1824 blocked for more than 121 seconds. [Mon Jan 15 22:24:34 2024] Tainted: P OE 6.1.55-production+truenas #2 [Mon Jan 15 22:24:34 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
I rebooted multiple times and this persists. After sshd goes online during boot sequence I see that zpools are not ready and not mounted for a long time. After they are finally mounted, I manually boot VM and it works fine. No kube and no charts though.
I can live with it, of course, charts are not so important and reboots are rare, but it troubles me that seemingly trivial operation like adding new pool and replicating datasets to it broke something.
Where can I look for problems? Smartctl and hdsentinel show that disks are good. Maybe some specific logs of kubernetes?