VDEV Slow - Stalls boot

avguy1989

Cadet
Joined
Oct 4, 2021
Messages
3
So i have a 9 disk array of 10tb drives
been running great for 2+years

suddenly, it will not boot and show errors regarding SLOW VDEV and vdev_deadman notifications and refuses to boot up.
If I disconnect the drives, I can boot the OS but as soon as you try to mount the array, it locks up.

I have reinstalled the OS, swapped boot drives, etc but everytime it errors out and does not boot.

Please HELP!
If specific details are needed please let me know, I just don't know how to retrieve configuration files from any of the VDEVs/Arrays without being able to boot the server.

error image attached
freerrror.PNG


It will just sit here and repeat this over and over.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
So the first stop will be the SMART data for your pool disks.

smartctl -a /dev/daX (replace daX with da1 or ada1, etc. as is the case for your disks)

It's either one or more of the disks or the disk controller/cabling that's likely the problem.
 

avguy1989

Cadet
Joined
Oct 4, 2021
Messages
3
I was able to get past this portion.
out of desperation I did CTRL-C and it booted! Was able to run a scrub and check SMART on all drives. ZERO issues reported.

Anyway to help identify cabling/HBA issue remotely? Server is 6 hours away.

Thanks!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Anyway to help identify cabling/HBA issue remotely? Server is 6 hours away.
it's not ideal, but you could run read tests without risk by using dd:

dd if=/dev/da1 of=/dev/null bs=1024k count=10000 (that would read 1 Gigabyte off da1 and put it nowhere, which would then result in many errors in dmesg and go very slowly if there were cable problems... repeat for other disks)

Unlike a SMART test which happens inside the drive, that woule at least involve the cables.

Are you really sure you are looking at the SMART test results (from recent long tests) and not just seeing the word Pass and stopping? (a disk can show as passing a SMART test, but still be dying).
 

avguy1989

Cadet
Joined
Oct 4, 2021
Messages
3
yeah....there is something going on with backplane or HBA or cabling

ran that test on my spinning disks and it came back at around 45 seconds each drive!!!!!
Ran it then on da0 and da1 (mirrored boot pool) and it came back with 1.2 seconds.

Going to have to take a drive to get eyes on it and bring a new HBA and cabling with me. Hopefully it is not the backplane although I do have a spare 12 bay I could use temporarily
 
Top