Replacing a Drive in FreeNAS Z2 - 9.10 FreeNAS Reference Manual Not Clear

Status
Not open for further replies.

Mark_the_Red

Dabbler
Joined
May 3, 2017
Messages
28
Long time lurker; first time poster.

Raid Z2 9.10 Freenas Server Here. I have a 6x3tbRaid Z2 server (6 x 3tb WD Green wdidle.exe fixed) and has been working swimmingly for over a year now.

Recently one of my HDD's has been giving uncorrectable sector errors (197,198 smrtctl codes) in 4 sectors. I have tried (and failed) to deliberately write to the sectors using the instructions in the link below:

https://dekoder.wordpress.com/2014/10/08/fixing-freenas-currently-unreadable-pending-sectors-error/

For whatever reason, the smrtctl -a /dev/ada6 is not locating the sectors with which I can force write to via the bottom solution. The sector errors have been at 4 but I am sick of the critical alert emails. Drive is still working but (maybe unrelated) my write speeds have been halved since this started.

So I have two options.

(a) Wipe the drive / format and resilver it from the FreeNAS webgui to hopefully overwrite these sector errors.

(b) Replace the drive with a new one and resilver.

My question for both, is how to safely do this. PLEASE DO NOT TELL ME TO REFER TO THE FREENAS MANUAL ON REPLACING DRIVES BECAUSE THE INSTRUCTIONS THERE ARE FOR OLDER VERSIONS OF FREENAS THAT TO NOT APPLY TO FREENAS 9.10. 9.10 does not give you the option /button to offline the drive from the Webgui. I have either "Wipe" or "Edit" options and no "offline" for ada6.

That said, what is the easiest way to do this without screwing up my Zpool.

Do I just "wipe" the drive from the Webgui because I have enough redundancy with the ZF2? Do I just shut down the server and physically pull the drive , wipe and replug it in?

This is probably really stupid question for most of you, but I admit I did not go to Linux bootcamp so I'd rather ask you pros. ie., measure twice and cut once with this action. What is my safest course of action here? I searched the forums and couldn't find an answer to this question.
 
Last edited by a moderator:
D

dlavigne

Guest
PLEASE DO NOT TELL ME TO REFER TO THE FREENAS MANUAL ON REPLACING DRIVES BECAUSE THE INSTRUCTIONS THERE ARE FOR OLDER VERSIONS OF FREENAS THAT TO NOT APPLY TO FREENAS 9.10. 9.10 does not give you the option /button to offline the drive from the Webgui.

Both the guide built into your system and available at http://doc.freenas.org/9.10/storage.html#replacing-a-failed-drive contain the correct instructions for 9.10. Those also indicate:

1. If the disk is formatted with ZFS, click the disk’s entry then its Offline button in order to change that disk’s status to OFFLINE. This step is needed to properly remove the device from the ZFS pool and to prevent swap issues. If the hardware supports hot-pluggable disks, click the disk’s Offline button, pull the disk, then skip to step 3. If there is no Offline button but only a Replace button, the disk is already offlined and you can safely skip this step.
 

Mark_the_Red

Dabbler
Joined
May 3, 2017
Messages
28
What am I missing here? Why is there no Offline button like the manual / post states? All I see is edit or wipe.

upload_2017-5-3_15-40-41.png
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Why is there no Offline button like the manual / post states?
...because you're at the wrong screen. You're at the View Disks screen, and you need to be at the Volume Status screen.
Edit: add screenshot:
Screen Shot 2017-05-03 at 5.25.21 PM.png


Edit 2: ...and the manual (the link Dru posted) does tell you to go to this screen; you don't need to take my word for it. From the text immediately above what she quoted from that page:
Before physically removing the failed device, go to Storage → Volumes → View Volumes. Select the volume’s name. At the bottom of the interface are several icons, one of which is Volume Status. Click the Volume Status icon and locate the failed disk. Then perform these steps:
 
Last edited:

rs225

Guru
Joined
Jun 28, 2014
Messages
878
That blog and it's bottomish comment are wrong.
If you have a AF/512e(aka 4K) drive, and are using bs=4096, then seek needs to be divided by 8, drop the remainder.

I think it is better though to use bs=512(or omit bs), seek is rounded down to first divisible by 8 if not already divisible by 8, and count=8. You are less likely to make mistake on the seek address that way.
 
Last edited:

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
That blog and it's bottomish comment are wrong.
You're right, an 850M buffer size is just silly. For SATA drives, a buffer size of 128K or 256K is usually more than adequate to keep up with the drive. Use 1M if there is any concern.

But also, growing bad block lists means the drive is failing and should be replaced.
 

Mark_the_Red

Dabbler
Joined
May 3, 2017
Messages
28
...because you're at the wrong screen. You're at the View Disks screen, and you need to be at the Volume Status screen.
Edit: add screenshot:
View attachment 18187

Edit 2: ...and the manual (the link Dru posted) does tell you to go to this screen; you don't need to take my word for it. From the text immediately above what she quoted from that page:

Well aren't I dumber than a bag of hammers. Thank you. Still need to replace the drive, I think.
 

Mark_the_Red

Dabbler
Joined
May 3, 2017
Messages
28
You're right, an 850M buffer size is just silly. For SATA drives, a buffer size of 128K or 256K is usually more than adequate to keep up with the drive. Use 1M if there is any concern.

But also, growing bad block lists means the drive is failing and should be replaced.


Ok. So the correct syntax for me should be with the following log errors:

Sector Sizes: 512 bytes logical, 4096 bytes physical

upload_2017-5-3_21-50-33.png


Code:
dd if=/dev/zero of=/dev/ada6 bs=512 count=8 seek= 1565538984 conv=noerror,sync


I've ordered a replacement drive, but I wanted to figure this out for my own linux growth. (1565538984 /8 = 195692373 (even division by 8)
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Linux!? Where did you see Linux we must remove it.

Sent from my Nexus 5X using Tapatalk
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
The options for dd in FreeBSD are not case-sensitive. Linux is, oddly. I don't feel that remapping some new bad blocks is useful. For one, it might overwrite valid data on the drive. A scrub should detect that. The other thing is that when a drive starts growing new bad blocks, it's usually the media going bad, and it doesn't stop.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Your command looks right to me, except you have an extra space after seek=. dd is dangerous, always check that your command is correct and especially that you have a correct count=X to stop it. After you've overwritten the two spots, try another long test.

If it isn't clear after a few rounds, either wipe the whole drive and/or replace the drive.

The next scrub could have a few checksum errors repaired after this process, although in this case it seems unlikely there is actually any ZFS data on those sectors.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Still need to replace the drive, I think.
Yes, I'd agree. I don't trust the technique of "force remapping the sector by overwriting the bad block"--maybe that's just me being conservative, but SMART test failures call for replacing the drive in my pool.
 
Status
Not open for further replies.
Top