Advice on a degraded pool that is now online - suspect SATA cable lightly plugged

el_reddaio

Cadet
Joined
Dec 27, 2020
Messages
4
Hi there,

Summary:
I had an issue with my pool being degraded after moving components to a new case & motherboard.
However, now the pool is back online and TrueNAS does not give me errors anymore during boot.
I'm running a long smart test on the incriminated drive, but I suspect the issue was due to the Sata cable not being fully plugged.

Extended version:
I am running TrueNAS on an old Core 2 Quad, with six WD Red drives formatted in Raid Z2 and two usb drives for boot.
It's important to state that all the drives are WDC WD40EFRX-68N32N0, but while 5 of them are only two years old, one of them is 5 years old.

Last week I proceeded to move the drives to a newer Intel i7 6700K, but after encountering a few problems with network and possibly CPU, I decided to move everything back to the old Core 2 Quad system.

As soon as I booted, TNAS started throwing several read errors while mounting the drives.
I got a notification email telling me which drive and surprisingly, it was one of the new ones, not the 4-5 years old one.
I rebooted another couple of times, same problem, so I turned off the computer and started procuring a replacement drive.

After a few days keeping the NAS off, I finally have the replacement drive, but out of curiosity I checked the sata cable on the existing drive and noticed that it wasn't 100% plugged, I would say 75% in.

I plugged the cable deeply, and did a first boot: no more console errors.
The pool still shows up as degraded with unrecoverable errors and a red tick.
I reboot again: now it shows up as online with the green tick.
So now I'm running a long smart test on the incriminated drive, just to see if it is really faulty.

Now, my questions:
  1. Could a lightly attached Sata cable cause these kind of errors?
    1. Sorry for not posting the specific errors I received, I didn't take a screenshot of it when it happened.
    2. Curiosity: is there a way to find the error log so that I can pull up these errors?
  2. If the smart test doesn't return errors, I guess I should run a scrub on the pool?
  3. If the drive comes completely clean, should I replace it anyway?
  4. I cannot find any option to replace the disk, I followed this guide and these menus are just not there... I'm not going to how swap the disk, I will simply turn off the nas and swap the drive - will that make the option appear?
Please advise :)

Kind regards,
Stefano
 
Joined
Jan 27, 2020
Messages
577
Could a lightly attached Sata cable cause these kind of errors?
Yes.
  1. If the smart test doesn't return errors, I guess I should run a scrub on the pool?
Yes, as it is recommended, scrubs should be done periodically, at least once a month (maybe more often, depending on your R/W on the pool) It doesn't hurt to have the pool scrubbed after zfs uncovered errors.
  1. If the drive comes completely clean, should I replace it anyway?
Why? If SMART and scrub does not uncover any issues, the drive can be considered as healthy. Maybe check and re-seat cabling on every drive.
I cannot find any option to replace the disk, I followed this guide and these menus are just not there... I'm not going to how swap the disk, I will simply turn off the nas and swap the drive - will that make the option appear?
You did not mention what version of Free/TrueNAS you're running, if it is one of the more recent versions, it should work just like the guide suggests. Shut down and replace should also work, though TrueNAS will throw errors when you boot up again.
You could do all this via the cli as well:
zpool offline old drive
zpool replace new drive
 

el_reddaio

Cadet
Joined
Dec 27, 2020
Messages
4
Hi Mistermanko!
Thanks for the advice.
At this point I'm not sure if it makes sense to keep the new drive, but I guess I could keep it as a spare just in case something else breaks.

BTW I am on TrueNAS-12.0-U8 and my threshold days for Scrub is set to 35 days.
I don't do a lot of R/W, I use the nas as a torrent machine, storage for my photos and media server, but I'm the only user (hence why a Core 2 Quad with 8GB of Ram has been perfectly fine so far).
 
Joined
Jan 27, 2020
Messages
577
Now that you mention it is a new drive, did you burn-in the drive? If not, I would highly suggest to do this before including it into a production environment. A burn-in and a long Smart-test afterwards would surface any issues, and if cleanly finished, you can be pretty sure that you got a reliable spinning disk.


my threshold days for Scrub is set to 35 days.
That's sufficient.

I don't do a lot of R/W, I use the nas as a torrent machine, storage for my photos and media server, but I'm the only user (hence why a Core 2 Quad with 8GB of Ram has been perfectly fine so far).
A torrent machine can chew up lot's of RAM, which you are very short supplied with (general recommendation is at least 16GB). Keep in mind that ZFS is really dependent on good(ECC highly recommended) and plenty(at least 16 GB) memory. Make sure your system passes the minimum hardware recommendations to prevent a future fatal data loss ;)
 

el_reddaio

Cadet
Joined
Dec 27, 2020
Messages
4
My other system is i7 + 16GB, and has 8 SATA ports, I will be able to do all of the above :)

Thank you so much for the advice!
 
Top