Failing SSDs: best way not to lose data

goosesensor

Dabbler
Joined
Jan 7, 2019
Messages
10
I setup a FreeNAS machine about four years ago with 8x1TB SSDs in RAIDZ2.

21 days ago I finally updated to TrueNAS Core.

Today I logged in and noticed an alert: 2 drives are degraded (66 and 99 read errors respectively). I was hoping to get more life out of these drives, and surprised (and somewhat horrified) to see two start failing in such close temporal proximity.

I want to give myself the best odds of saving my data in the event a third drive fails. What is the very next thing I should do?

1. Replace both drives and let it rebuild (reading and writing)
2. Replace only one drive and let it rebuild (reading and writing) I have a spare taped inside the case.
3. Plug in a USB SATA enclosure with a large HDD and use rsync or something similar to copy the pools contents (mostly reading)
4. Copy to another machine via SSH/rsync (mostly reading) Probably the same as #3.

Any thoughts appreciated.
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
First and foremost: Always have backups. So, item 4. or 3. Permanently.

If you have enough spart ports, it's possible to replace drives without unplugging the old ones, and thus without reducing redundancy.
Put the spare in use, without removing any drive. If possible, add a new drive and do both replacements at the same time without removing the old drives.

4 years is not that bad. What's the drive model? Are they all the same?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I would also add that a test of the "spare taped inside the case" should be done, before using it. Basically a bad blocks test on another system.

Also second the replace in place, meaning installing the replacement drive in the NAS. Then using ZFS' option to replace a bad drive. Having the existing drive still available means any redundancy it, (and the other bad drive), have, can still be used.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
@Arwen - SSD - surely you aren't advocating for a badblocks run on an SSD?
 

goosesensor

Dabbler
Joined
Jan 7, 2019
Messages
10
First and foremost: Always have backups. So, item 4. or 3. Permanently.

If you have enough spart ports, it's possible to replace drives without unplugging the old ones, and thus without reducing redundancy.
Put the spare in use, without removing any drive. If possible, add a new drive and do both replacements at the same time without removing the old drives.

4 years is not that bad. What's the drive model? Are they all the same?
I do swap physical backups at my folks place but it's been several months since I was there.

I didn't know you could replace them in that manner. That is great.

It's an ITX mobo and the single PCI-E slot has an SFP+ card in it. I will find a cheap supported SATA card to swap and use the mobo's built in ethernet for access while replacing the drives.

They are all 1TB "Pioneer" brand, model APS-SL3N-1TB.

I would also add that a test of the "spare taped inside the case" should be done, before using it. Basically a bad blocks test on another system.

Also second the replace in place, meaning installing the replacement drive in the NAS. Then using ZFS' option to replace a bad drive. Having the existing drive still available means any redundancy it, (and the other bad drive), have, can still be used.

Seems like a wise idea.

@Arwen - SSD - surely you aren't advocating for a badblocks run on an SSD?

Can you explain what you are saying here? Is there something wrong with running a check on an SSD or something? What software would you recommend? Linux preferred. Thanks.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Can you explain what you are saying here? Is there something wrong with running a check on an SSD or something? What software would you recommend? Linux preferred. Thanks.

Running badblocks on an ssd burns useful life for little to no return. The act of writing to an SSD wears it out, and badblocks performs that as an exercise. An ssd already remaps bad LBA's as part of the wear leveling. Which isn't to say it's entirely useless, but a pool scrub catches 90+%, so why burn the reserve flash over-subscription just to prove the firmware works?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Can you explain what you are saying here? Is there something wrong with running a check on an SSD or something? What software would you recommend? Linux preferred. Thanks.
badblocks isn't even designed to check drives, although it is commonly used for this purpose. It may do with HDDs, but is not suitable for SSDs.
You may use solnet-test-array (BSD), since it is read only. Or just run SMART tests and trust the manufacturer…
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@Arwen - SSD - surely you aren't advocating for a badblocks run on an SSD?
I was suggesting a read test. Probably should have been clearer.

A read test of a SSD should / will refresh blocks that are close to loosing their charge. Basically any read of a flash cell uses up some of the electrical charge. So does just sitting for a long time. And so does warm or hot environments.

Causing a SSD to re-read all the blocks will find any that are low on power, (aka voltage). The SSD should / will automatically re-write them. And, if you find any blocks that have already lost their ability to recover, then you know about it before putting it into service. A simple write to the "bad" block will cause a SATA device to swap out the bad block with a good spare block.
 

goosesensor

Dabbler
Joined
Jan 7, 2019
Messages
10
@Etorix you say that 4 years is not bad.. so it sounds like these SSDs are failing due to normal use, not due to, say, random manufacturing tolerances or something or that sort? Heat degradation etc. If that is the case, it seems fair that I could expect other drives to start showing faults in the near future, too? Recall that everything was fine less than a month ago, and since then two drives have shown faults.

What I'm getting at is, should I just replace all of them now with larger capacity drives? I don't want to replace two 1TB drives now, only for the remaining 6 drives to fail in the coming months. If that is likely to be the case, I would rather put 8 new 2TB drives in now.


A second, somewhat unrelated question: I don't have any spare SATA ports, but I do have a USB<->SATA adapter. When I connect my spare with this adapter, it shows up as da1 (da0 is OS USB stick, ada0-ada7 are RAIDZ2 pool drives). Not only does it show up as da1, not ada8 or similar, its serial number seems to be emulated by the adapter's chip, it shows up as "123456789012". Can I still use the USB adapter to replace one of the failing drives/Resilver, or is the device ID/name (da1) and/or fake serial number going to cause problems once I swap the bad SATA-connected drive out for the new, resilvered USB-connected drive?
 
Last edited:

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
@Etorix you say that 4 years is not bad.. so it sounds like these SSDs are failing due to normal use, not due to, say, random manufacturing tolerances or something or that sort? Heat degradation etc. If that is the case, it seems fair that I could expect other drives to start showing faults in the near future, too? Recall that everything was fine less than a month ago, and since then two drives have shown faults.
That's my concern. A single failure could be just that: A single failure due to random variance.
Two identical drives failing in short succession may be drives reaching end-of-life under the load caused by the pool (e.g. reaching maximum writes), and if so, one may indeed expect that further drives will follow.

What I'm getting at is, should I just replace all of them now with larger capacity drives? I don't want to replace two 1TB drives now, only for the remaining 6 drives to fail in the coming months. If that is likely to be the case, I would rather put 8 new 2TB drives in now.
The two failing drives should certainly be replaced. Larger drives would not hurt.
You may also want to consider replacing the pool by another one, with a different geometry: Less drives, but larger ones.
8*1 TB in Z2 provides "only" 6 TB raw (about 4 TB usable before the pool has to grow).
With 2 TB drives, 6 drives in Z2 already provides 8 TB raw.
With 4 TB drives (TLC rather than QLC), just 4 drives in Z2 provide 8 TB, 6 drives provide 16 TB.

In this case, you'll need either a HBA to attach all drives or another (temporary) server and then replicate the old pool to the new one.

A second, somewhat unrelated question: I don't have any spare SATA ports, but I do have a USB<->SATA adapter. When I connect my spare with this adapter, it shows up as da1 (da0 is OS USB stick, ada0-ada7 are RAIDZ2 pool drives). Not only does it show up as da1, not ada8 or similar, its serial number seems to be emulated by the adapter's chip, it shows up as "123456789012". Can I still use the USB adapter to replace one of the failing drives/Resilver, or is the device ID/name (da1) and/or fake serial number going to cause problems once I swap the bad SATA-connected drive out for the new, resilvered USB-connected drive?
'daN' names are no concern (ZFS tracks drives by gptid anyway), but a USB adapter is not reliable enough to be used for long-term storage in a data pool.
If replacing drives to keep the 8-wide Z2, tough, I'd be tempted to try and use the adapter during resilver—maybe with the new drive already attached to a SATA port and the old drive attached to the USB adapter so it can still provide full redundancy. If it works, repeat. If issues arise (e.g. the USB-adapted drive dropping during resilver), revert to the traditional method: Offline the old drive, plug the new drive in its place, replace and resilver with reduced redundancy. (It's a raidz2 anyway, so there's still one degree of redundancy here.)
 

goosesensor

Dabbler
Joined
Jan 7, 2019
Messages
10
I was able to offline/replace one of the failing SATA drives with the new spare connected via USB <-> SATA adapter, then swap the new drive into the failing drive's SATA port and it all worked fine. Waiting on 2nd replacement in the mail.

Thanks for your help.
 

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64
I thought when drive is showing as degraded, that means its failed and not used, but what is written above implies degraded drives are still used for redundancy? Is that correct? To what extent? Like, if we have 2 drives "degraded" in Z2 VDEV and 3rd becomes "degraded", does it not mean that whole vdev data is lost?
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
I thought when drive is showing as degraded, that means its failed and not used, but what is written above implies degraded drives are still used for redundancy? Is that correct? To what extent? Like, if we have 2 drives "degraded" in Z2 VDEV and 3rd becomes "degraded", does it not mean that whole vdev data is lost?

Pools become degraded. Drives have different behaviors depending on the type device and the failure experienced. HDD's can generate uncorrectable read errors, map them out of use and run on for years. Or they can work perfectly one day, get powered off, and fail to spin back up because the spindle bearings are seized. For the uncorrectable read errors, SSD's are built to do an equivalent substitution, when cells wear out spare replacements are mapped in. These spares are engineered into the devices and are often referred to as flash over-subscription. Your 1Tb flash drive may actually contain 1.5Tb of flash memory to effect a specified life-span, with 500MiB sitting around like a hidden partial hotspare.

The problem with SSD's is their entire structure is stored in the very thing that fails. At initiation of failure a failing HDD may still respond to commands, and may produce data for 90+% of the read requests it receives, aiding the resilvering process. If the failure leads to cascading damage, this number may drop faster than you can silver in a replacement. When an SSD runs out of replacement cells, how the failure propagates becomes very dependent on where the next bad LBA appears. I've experienced Enterprise NVMe drives function perfectly one minute, and have no addressable namespace the next. The structure describing the namespace was simply gone.

With Z2 VDEV's, you need to meet the minimum number of parity stripes per VDEV to reconstruct the data in a logical block address (LBA). Four devices being the minimum for Z2, you need two devices to read each parity stripe for an LBA. VDEV's with more devices will have different minimums. For the 4 device VDEV these don't need to be the same two devices. The faulty devices may return valid data. It's possible to "puncture" individual device LBA's and create a checkerboard of failures that still result ZFS reconstructing valid data. But, once you lose the VDEV, the pool is gone. A RAIDZ2 pool comprised of 3 VDEV's of 4 devices each will be destroyed by losing 3 devices in the same VDEV.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Like @rvassar, said, drives, (HDD or SSD), that still have usable data can be used to assist with recovery. Something that was not common when ZFS came out.

One point I would like to make, is that ZFS treats recovery by stripe of disk blocks. Simple example, you have a 4 disk RAID-Z2;

disk 0 - Failing block 10
disk 1 - Failing block 10 too
disk 2 - Failing block 20
disk 3 - Failing block 20 too

This allows the vDev to be fully recoverable. Block 10 is recoverable from disks 2 & 3. With disks 0 & 1 supplying block 20. But, you have to use replace in place because pulling any disk removes one stripe's worth of redundancy.

Plus, ZFS only re-builds used data on storage. So other storage block failures are not relevant during re-silvers.
 
Top