Checklist before installing additional storage drives

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
I currently have one pool with one vdev consisting of two mirrored disks. That has worked for about 2 years, but it's time to expand. I just bought two more identical disks.

I'd like to first replace one of the mirrored set with a new one. That should minimize the odds of a drive failing, and then the second drive failing in the rebuild. I'm planning to simply follow the guide for replacing a failed disk: https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/

It looks pretty simple. Offline the disk, install a new one, then use Replace.

After that I'll add the other new drive paired with the one I just removed as another mirrored vdev and add that vdev to the existing pool.

This seems pretty simple. Am I missing anything?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I am not sure if the GUI will allow, but you can perform a faster and safer "replace in place". Basically you install your 2 new disks in the computer and don't off line one of your existing disks.

What this accomplishes is that ZFS will create a temporary 3-way Mirror of the old drive to be removed and the new drive, (plus, the one staying). If their are any problems with source data blocks, you have both source drives to improve your chances of getting good data. When the re-silver is complete, the old drive you selected will be detached and you will be left with a 2-way Mirror again.

When ZFS first came out, this was one of its unique features. Many low and middle end hardware RAID controllers had less flexible disk replacement schemes. (And have bit me in the past!)


Ideally, unless you have a need for Mirroring's higher IOPS, you would convert your 4 disks into a RAID-Z2. Same 2 disk redundancy, BUT, any 2 disks can fail. Unlike, 2, 2-way Mirrors. If you have both disks in a 2-way Mirror fail, (or 1 fail and the other having bad blocks during re-silver), you can loose data. Not with RAID-Z2.

And yes, their is a complicated way to migrate your data to a RAID-Z2 using degraded pool. In general we don't recommend it, because done wrong, you loose data.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
remember to do proper burn-in of the new drives :smile:
 

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
I am not sure if the GUI will allow, but you can perform a faster and safer "replace in place". Basically you install your 2 new disks in the computer and don't off line one of your existing disks.

What this accomplishes is that ZFS will create a temporary 3-way Mirror of the old drive to be removed and the new drive, (plus, the one staying). If their are any problems with source data blocks, you have both source drives to improve your chances of getting good data. When the re-silver is complete, the old drive you selected will be detached and you will be left with a 2-way Mirror again.

When ZFS first came out, this was one of its unique features. Many low and middle end hardware RAID controllers had less flexible disk replacement schemes. (And have bit me in the past!)


Ideally, unless you have a need for Mirroring's higher IOPS, you would convert your 4 disks into a RAID-Z2. Same 2 disk redundancy, BUT, any 2 disks can fail. Unlike, 2, 2-way Mirrors. If you have both disks in a 2-way Mirror fail, (or 1 fail and the other having bad blocks during re-silver), you can loose data. Not with RAID-Z2.

And yes, their is a complicated way to migrate your data to a RAID-Z2 using degraded pool. In general we don't recommend it, because done wrong, you loose data.
I'm sorry for the late reply here, I meant to do this upgrade sooner, but I'm finally ready to do it today.

Do you mean that I can install both new disks, then use the Replace button on one old disk and select one of the new ones?

The benefit there would be that if it fails, I never offline'd a disk, so my chances of recovery are very good.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Sorry, I don't know the GUI as well as others. Perhaps someone else can answer.
 

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
remember to do proper burn-in of the new drives :smile:
Thank you for the reminder! How should I do that? I seem to recall last time I just had to go to Disks > {disk} > SMART Test Results, and that will automatically run the test. Is that accurate? I very well could be forgetting something.

Sorry, I don't know the GUI as well as others. Perhaps someone else can answer.
I have installed the new disks into my system and they are showing up with the correct serial numbers and capacities. I haven't done anything with my existing pool.

I'm surprised I haven't been able to find a standard help article for replacing a healthy disk. I have to expect admins typically want to replace drives near the EOL before a failure, right? This seems like a common thing, but every post or forum thread I can find only talks about replacing a failed disk.

I am comfortable with command line, if there is a guide for replacement without the GUI I'm not terribly daunted by that.

I don't think I want to refactor my setup to RAID-Z2 at this time, I don't have any critical data only on my NAS, so I don't think the extra work is worth it.

My old disks are only two years old and I have written less than the capacity of the disks. I've probably read twice the capacity. What is the realistic failure case of doing the standard offline / replace? It seems like if one disk died in the process of replacing I should be able to just use the other one instead. I would really only be in trouble if both disks died, right? I think that's pretty unlikely given my usage.

While I am comfortable with command line, most of my experience is Linux / Windows, not FreeBSD, and I don't have a ton of TrueNAS specific knowledge. I would probably prefer to just use standard GUI options if the risk is low.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
Yes, you can do this.

...but pay attention to this.
Sounds great, that's reassuring to hear.

I suspect I should burn in the disks first. I wouldn't want to set up a new pool and discover a problem during burn in. Can you point me to a resource on how to do the burn in? I know I did my first two drives, but I can't remember what I did. I did a search on the docs and I'm not finding anything.

I double checked the getting started guide (https://www.truenas.com/docs/core/gettingstarted/storingdata/) but I don't see anything there either. Terribly sorry to bother you.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
Thanks so much everyone for the help!

I did a long SMART test on both disks and they don't look good. Unless I'm missing something.

Both drives show a raw value for Raw_Read_Error_Rate of 2772. The first drive has a raw value for Seek_Error_Rate of 3456731, the second one has 3433729. Everything is 0 as it should be.

That seems like a significant problem. Those seem like very high error rates when they should be zero. Should I return these disks?
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I did a long SMART test on both disks and they don't look good.
Just to avoid any misunderstanding: Running a long SMART test is not a proper burn-in. A burn-in, depending on the criticality of the data and/or your paranoia, will last from a few days to a few months. The time frame, in which new drives die relatively often, is a couple of months.

Unless I'm missing something.

Both drives show a raw value for Raw_Read_Error_Rate of 2772. The first drive has a raw value for Seek_Error_Rate of 3456731, the second one has 3433729. Everything is 0 as it should be.
Some disks (e.g. from Seagate) abuse these fields and "encode" the true values. Google how your drive handles this.

That seems like a significant problem. Those seem like very high error rates when they should be zero. Should I return these disks?
In addition to checking how the disk vendor uses the fields (see above), you still need to do a proper burn-in. Google will provide some help here. In addition, there is a script from @jgreco and @Spearfoot has something on GitHub.

 

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
Just to avoid any misunderstanding: Running a long SMART test is not a proper burn-in. A burn-in, depending on the criticality of the data and/or your paranoia, will last from a few days to a few months. The time frame, in which new drives die relatively often, is a couple of months.

Yes, I appreciate that. I was following the instructions listed here which includes running SMART tests. I didn't want to go any further if these are failing.

Some disks (e.g. from Seagate) abuse these fields and "encode" the true values. Google how your drive handles this.

My disks are Seagate. I checked their website and they say the encoding is proprietary. I'm sure someone has decompiled their application, but I haven't been able to find anything. The normalized values look fine, so the disks are probably fine. I will continue with the burn in, either following the previous guide or the one you posted here.

Thank you!

Edit: The GitHub script is nice, but I've opted to continue manually. I have both drives doing a full badblocks test, it looks like it will take about 9 hours. I will do another long smart test and compare the results.
 
Last edited:

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
My disks are Seagate. I checked their website and they say the encoding is proprietary. I'm sure someone has decompiled their application, but I haven't been able to find anything.
 

Attachments

  • Seagate smart data.pdf
    402.1 KB · Views: 140

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
Wow, that is a very useful document! Thank you!

I completed the badblocks test, it took around 72h, no bad blocks detected on either disk. After that I did an additional long SMART test. The results are as follows.
Disk \ ValueRead Error RateSeek Error Rate
Disk 123138221034282308
Disk 223418182634054366

Translating these values to binary, the upper three bytes are all zero, which would mean that there are no errors detected.

That seems pretty solid to me. I think I'm ready to add the disks to my system. I will start by replacing one of the disks in my existing vdev.

Thanks everyone!
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
It kinda goes without saying that you should also have a backup of your data before starting any disk operation. Since the title of the thread is "Checklist before installing additional storage drives", I would expect a current backup to be on the checklist.
 

HolyGizmo

Cadet
Joined
Jan 29, 2023
Messages
8
Thank you so much for the help everyone! After making all my backups, I replaced the drive. It took about 6 hours to resilver. Then I added the new vdev to the pool and everything seems to be working.

This was a wonderful experience, sincerely, thank you to everyone who donated their time to me, you are very kind.
 

Al Fuller

Dabbler
Joined
Aug 11, 2015
Messages
16
OK, maybe this is a dumb question, but - where is the 'replace' button in the GUI? I'm running TrueNAS-12.0-U8.1 and decided it is time to update everything. I intend to reuse my old IX Systems FreeNAS Mini [having purchased not 1, but 2 new TrueNAS mini+ systems in the last month that had hardware issues and had to be returned...]. While my primary inclination is to rebuild everything from scratch and hopefully do better than i did before in configurations, it will be nice to see what comprises the option to replace a drive from the GUI.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Al Fuller

Dabbler
Joined
Aug 11, 2015
Messages
16
Thanks for pointing me back to the docs. However, what I am seeing on my system is inconsistent with what I see in the docs. See attached screenshot, where there is no ". . ." menu, and no replace button on my system.

Did I take the wrong path through the GUI, or something else?

TrueNAS-GUI.png
 
Top