Hi everyone,
I'm running TrueNAS-12.0-U5.1 on a Sun ZFS Storage 7320 with a 24 disks shelf installed.
The appliance is powered by an Intel Xeon E5620 (2.40GHz) and has 24GB of RAM.
I'm running a raidz3 where I see no issues whatsoever:
Unfortunately after some usage the "whole thing" starts being not responsive. Commands given in the shell (the zpool itself) get stuck forever without releasing any output. I've seen in the logs the following and I'm not quite sure what can be the issue considering that the disks are OK, I've two identical appliances and I've tried them both with the same result, I've already tried to disable the S.M.A.R.T. on any disk:
Any hint would be very appreciated considering that I've tested TrueNAS on a Sun ZFS Storage 7120 as well and I've got the same issue. Thought was the controller I've moved to the 7320(s) but apparently that's not it. I'm exporting Datasets via NFS and a few Zvol(s) via iSCSI and everything works ..... for a while.
Thanks.
P.S. fastest way to reproduce the issue is launching the scrub.
I'm running TrueNAS-12.0-U5.1 on a Sun ZFS Storage 7320 with a 24 disks shelf installed.
The appliance is powered by an Intel Xeon E5620 (2.40GHz) and has 24GB of RAM.
I'm running a raidz3 where I see no issues whatsoever:
Code:
# zpool status pool: DES-Z state: ONLINE config: NAME STATE READ WRITE CKSUM DES-Z ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 gptid/f1221ad0-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f10a2d68-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f176d2fb-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f2078faa-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f275d32e-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f2b469bf-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f253433c-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f2ee3dd0-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f2edeaab-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f2cdd4c8-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f33c6536-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f3140207-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f32a2a8a-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f3d74425-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f36fe747-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f3cd4913-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f5323603-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f629098d-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f5e962f8-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f655d119-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f6556e89-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f66742d9-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f6b9c9c1-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 gptid/f68c6b05-1a32-11ec-b218-002128f03bd4 ONLINE 0 0 0 errors: No known data errors pool: boot-pool state: ONLINE config: NAME STATE READ WRITE CKSUM boot-pool ONLINE 0 0 0 da49p2 ONLINE 0 0 0 errors: No known data errors
Unfortunately after some usage the "whole thing" starts being not responsive. Commands given in the shell (the zpool itself) get stuck forever without releasing any output. I've seen in the logs the following and I'm not quite sure what can be the issue considering that the disks are OK, I've two identical appliances and I've tried them both with the same result, I've already tried to disable the S.M.A.R.T. on any disk:
Code:
(0:4:0/1): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (0:4:0/1): Tag: 0x1000001a, type 1 (0:4:0/1): ctl_process_done: 123 seconds (0:4:0/1): MODE SENSE(6). CDB: 1a 00 08 00 04 00 (0:4:0/1): Tag: 0x10000052, type 1 (0:4:0/1): ctl_process_done: 92 seconds ctl_datamove: tag 0x1000003c on (0:4:0) aborted ctl_datamove: tag 0x10000038 on (0:4:0) aborted ctl_datamove: tag 0x1000003e on (0:4:0) aborted ctl_datamove: tag 0x1000003f on (0:4:0) aborted ctl_datamove: tag 0x10000011 on (0:4:0) aborted ctl_datamove: tag 0x10000014 on (0:4:0) aborted ctl_datamove: tag 0x10000013 on (0:4:0) aborted ctl_datamove: tag 0x10000015 on (0:4:0) aborted ctl_datamove: tag 0x10000016 on (0:4:0) aborted ctl_datamove: tag 0x10000017 on (0:4:0) aborted ctl_datamove: tag 0x10000018 on (0:4:0) aborted ctl_datamove: tag 0x10000019 on (0:4:0) aborted ctl_datamove: tag 0x1000006e on (0:4:0) aborted ctl_datamove: tag 0x1000000c on (0:4:0) aborted ctl_datamove: tag 0x1000001e on (0:4:0) aborted ctl_datamove: tag 0x1000000b on (0:4:0) aborted (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3c e0 00 00 98 00 ctl_datamove: tag 0x1000001f on (0:4:0) aborted (0:4:0/1): Tag: 0x1000003c, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 02 e7 f8 00 00 00 28 00 (0:4:0/1): Tag: 0x10000038, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 d8 00 00 08 00 ctl_datamove: tag 0x10000042 on (2:4:0) aborted (0:4:0/1): Tag: 0x1000003e, type 1 (0:4:0/1): ctl_process_done: 215 seconds [414/1701] (0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 d0 00 00 08 00 ctl_datamove: tag 0x1000001b on (1:4:0) aborted (0:4:0/1): Tag: 0x1000003f, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 c8 00 00 08 00 (0:4:0/1): Tag: 0x10000011, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 c0 00 00 08 00 (0:4:0/1): Tag: 0x10000014, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d a8 00 00 08 00 (0:4:0/1): Tag: 0x10000013, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d a0 00 00 08 00 (0:4:0/1): Tag: 0x10000015, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 98 00 00 08 00 (0:4:0/1): Tag: 0x10000016, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 90 00 00 08 00 (0:4:0/1): Tag: 0x10000017, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 88 00 00 08 00 (0:4:0/1): Tag: 0x10000018, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 80 00 00 08 00 (0:4:0/1): Tag: 0x10000019, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 78 00 00 08 00 (0:4:0/1): Tag: 0x1000006e, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 00 60 a4 80 00 01 88 00 (0:4:0/1): Tag: 0x1000000c, type 1 (0:4:0/1): ctl_process_done: 215 seconds (0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d b0 00 00 70 00 .......
Any hint would be very appreciated considering that I've tested TrueNAS on a Sun ZFS Storage 7120 as well and I've got the same issue. Thought was the controller I've moved to the 7320(s) but apparently that's not it. I'm exporting Datasets via NFS and a few Zvol(s) via iSCSI and everything works ..... for a while.
Thanks.
P.S. fastest way to reproduce the issue is launching the scrub.