Devices (disks) removing themselves

GrimmReaperNL

Explorer
Joined
Jan 24, 2022
Messages
58
Has anyone had drives removing themselves? This morning I woke up to the following email:
ZFS has detected that a device was removed. impact: Fault tolerance of the pool may be compromised. eid: 111 class: statechange state: REMOVED host: TrueNAS time: 2023-12-03 09:09:26+0100 vpath: /dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa vphys: id1,enc@n3061686369656d30/type@0/slot@5/elmdesc@Slot_04/p2 vguid: 0x8E0647F40F65C32B pool: TrueNAS (0xC949AB0B68335164)
This drive was having no issues when I went to bed. I have @joeschmuck smart script running daily.

This isn't the first time I'm having this happen, over multiple versions of Truenas. core 11, scale 11, scale 12. But I never really found the cause.
It's happened to seemingly random disks either attached to mobo or hba.

Can I just unplug and replug while the system is on?
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
you are on scale, which is Linux based.

I'd say a first step to figuring this out would be looking at the dmesg log, which is as easy as firing up an ssh shell, or using the root console and typing dmesg -H and scrolling down (by typing F) to about the time that one of these removals happens. is there interesting there?
 
Last edited by a moderator:

GrimmReaperNL

Explorer
Joined
Jan 24, 2022
Messages
58
you are on scale, which is Linux based.

I'd say a first step to figuring this out would be looking at the dmesg log, which is as easy as firing up an ssh shell, or using the root console and typing dmesh -H and scrolling down (by typing F) to about the time that one of these removals happens. is there interesting there?
"zsh: command not found: dmesh"

I've since replugged the disk, offline and online'd it, and it's resilvered. Not sure when it'll decide a disk was removed again.
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
I apologize there was not a way to edit my post that I could see.

the correct command (for scale) is:
dmesg -H

given that you've seen this issue on core and scale, I have a suspicion it might be hardware related, IE disks/controllers/cables/etc
 

GrimmReaperNL

Explorer
Joined
Jan 24, 2022
Messages
58
I apologize there was not a way to edit my post that I could see.

the correct command (for scale) is:
dmesg -H

given that you've seen this issue on core and scale, I have a suspicion it might be hardware related, IE disks/controllers/cables/etc
just under your message there should be a 'edit' button.

I was able to find this:
[Dec 3 09:09] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ +0.000199] ata5.00: irq_stat 0x40000001 [ +0.000175] ata5.00: failed command: FLUSH CACHE EXT [ +0.000167] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 13 res 53/04:00:b8:51:cf/00:00:fe:05:00/a0 Emask 0x1 (device error) [ +0.000374] ata5.00: status: { DRDY SENSE ERR } [ +0.000174] ata5.00: error: { ABRT } [ +0.006599] ata5.00: n_sectors mismatch 27344764928 != 0 [ +0.000278] ata5.00: revalidation failed (errno=-19) [ +0.000252] ata5: limiting SATA link speed to 3.0 Gbps [ +0.000216] ata5.00: limiting speed to UDMA/100:PIO3 [ +0.000201] ata5: hard resetting link [ +0.314326] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ +0.000454] ata5.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [ +0.000379] ata5.00: revalidation failed (errno=-5) [ +0.000252] ata5.00: disable device [ +5.247906] ata5: hard resetting link [ +0.314563] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ +0.000551] ata5.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [ +5.060879] ata5: hard resetting link [ +0.315026] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ +0.001494] ata5.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [ +0.000914] ata5: limiting SATA link speed to 3.0 Gbps [ +5.058598] ata5: hard resetting link [ +0.314722] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ +0.001427] ata5.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [ +5.063938] ata5: hard resetting link [ +0.319179] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ +0.000671] sd 5:0:0:0: [sdl] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=22s [ +0.000108] sd 5:0:0:0: rejecting I/O to offline device [ +0.000538] sd 5:0:0:0: [sdl] tag#13 Sense Key : Illegal Request [current] [ +0.000596] I/O error, dev sdl, sector 4412916832 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2 [ +0.000612] sd 5:0:0:0: [sdl] tag#13 Add. Sense: Unaligned write command [ +0.000650] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2257265868800 siz> [ +0.000654] sd 5:0:0:0: [sdl] tag#13 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [ +0.001435] I/O error, dev sdl, sector 3710137552 op 0x0:(READ) flags 0x0 phys_seg 3 prio class 2 [ +0.000038] I/O error, dev sdl, sector 3188158016 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ +0.000014] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=1630189355008 siz> [ +0.000021] I/O error, dev sdl, sector 4301974816 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2 [ +0.000009] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2200463556608 siz> [ +0.000016] I/O error, dev sdl, sector 25749836232 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2 [ +0.000008] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=13181768601600 si> [ +0.000014] I/O error, dev sdl, sector 4298143560 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2 [ +0.000007] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2198501953536 siz> [ +0.000056] I/O error, dev sdl, sector 4301974824 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2 [ +0.000002] I/O error, dev sdl, sector 4194960 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ +0.000009] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=270336 size=8192 > [ +0.000005] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2200463560704 siz> [ +0.000024] I/O error, dev sdl, sector 27344763536 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ +0.000008] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=13998371381248 si> [ +0.000011] I/O error, dev sdl, sector 4359288736 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 2 [ +0.000010] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2229808283648 siz> [ +0.000001] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=13998371643392 si> [ +0.000028] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2256677965824 siz> [ +0.000017] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2262439178240 siz> [ +0.000019] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=2 offset=2264731918336 siz> [ +0.000394] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=5 offset=0 size=0 flags=10> [ +0.000422] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=1897442877440 siz> [ +0.000178] ata5: EH complete [ +0.000403] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=1897442893824 siz> [ +0.010230] ata5.00: detaching (SCSI 5:0:0:0) [ +0.004242] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=1897442910208 siz> [ +0.019176] md/raid1:md124: Disk failure on sdl1, disabling device. md/raid1:md124: Operation continuing on 1 devices. [ +0.036096] sd 5:0:0:0: [sdl] Synchronizing SCSI cache [ +0.000633] sd 5:0:0:0: [sdl] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ +0.000474] sd 5:0:0:0: [sdl] Stopping disk [ +0.000499] sd 5:0:0:0: [sdl] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ +0.510785] zio pool=TrueNAS vdev=/dev/disk/by-partuuid/56f3655d-2ed5-11ed-ab7a-3cecef8c44fa error=5 type=1 offset=270336 size=8192 > [ +4.170870] md124: detected capacity change from 4188160 to 0 [ +0.000014] md: md124 stopped.

I'll be able to check the next time something happens, if this happened again. Thanks.
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
Given that the link keeps dropping and the speed is changing, try changing the cable and if possible, which port it's plugged into.

not all systems support hotswap, especially integrated SATA controllers on OEM systems (where like virtualization it's often disabled for no real reason), but given that your dmesg log shows it is re-establishing a link you probably can hotswap it, but offline it via the GUI before you do to minimize issues.
 

GrimmReaperNL

Explorer
Joined
Jan 24, 2022
Messages
58
Given that the link keeps dropping and the speed is changing, try changing the cable and if possible, which port it's plugged into.

not all systems support hotswap, especially integrated SATA controllers on OEM systems (where like virtualization it's often disabled for no real reason), but given that your dmesg log shows it is re-establishing a link you probably can hotswap it, but offline it via the GUI before you do to minimize issues.
I'll keep that in mind. Thanks.
 
Top