Replacing a Drive in FreeNAS Z2 - 9.10 FreeNAS Reference Manual Not Clear

Mark_the_Red · May 3, 2017

Long time lurker; first time poster.

Raid Z2 9.10 Freenas Server Here. I have a 6x3tbRaid Z2 server (6 x 3tb WD Green wdidle.exe fixed) and has been working swimmingly for over a year now.

Recently one of my HDD's has been giving uncorrectable sector errors (197,198 smrtctl codes) in 4 sectors. I have tried (and failed) to deliberately write to the sectors using the instructions in the link below:

https://dekoder.wordpress.com/2014/10/08/fixing-freenas-currently-unreadable-pending-sectors-error/

For whatever reason, the smrtctl -a /dev/ada6 is not locating the sectors with which I can force write to via the bottom solution. The sector errors have been at 4 but I am sick of the critical alert emails. Drive is still working but (maybe unrelated) my write speeds have been halved since this started.

So I have two options.

(a) Wipe the drive / format and resilver it from the FreeNAS webgui to hopefully overwrite these sector errors.

(b) Replace the drive with a new one and resilver.

My question for both, is how to safely do this. PLEASE DO NOT TELL ME TO REFER TO THE FREENAS MANUAL ON REPLACING DRIVES BECAUSE THE INSTRUCTIONS THERE ARE FOR OLDER VERSIONS OF FREENAS THAT TO NOT APPLY TO FREENAS 9.10. 9.10 does not give you the option /button to offline the drive from the Webgui. I have either "Wipe" or "Edit" options and no "offline" for ada6.

That said, what is the easiest way to do this without screwing up my Zpool.

Do I just "wipe" the drive from the Webgui because I have enough redundancy with the ZF2? Do I just shut down the server and physically pull the drive , wipe and replug it in?

This is probably really stupid question for most of you, but I admit I did not go to Linux bootcamp so I'd rather ask you pros. ie., measure twice and cut once with this action. What is my safest course of action here? I searched the forums and couldn't find an answer to this question.

dlavigne · May 3, 2017

Mark_the_Red said:
PLEASE DO NOT TELL ME TO REFER TO THE FREENAS MANUAL ON REPLACING DRIVES BECAUSE THE INSTRUCTIONS THERE ARE FOR OLDER VERSIONS OF FREENAS THAT TO NOT APPLY TO FREENAS 9.10. 9.10 does not give you the option /button to offline the drive from the Webgui.

Both the guide built into your system and available at http://doc.freenas.org/9.10/storage.html#replacing-a-failed-drive contain the correct instructions for 9.10. Those also indicate:

1. If the disk is formatted with ZFS, click the disk’s entry then its Offline button in order to change that disk’s status to OFFLINE. This step is needed to properly remove the device from the ZFS pool and to prevent swap issues. If the hardware supports hot-pluggable disks, click the disk’s Offline button, pull the disk, then skip to step 3. If there is no Offline button but only a Replace button, the disk is already offlined and you can safely skip this step.

Mark_the_Red · May 3, 2017

What am I missing here? Why is there no Offline button like the manual / post states? All I see is edit or wipe.

dlavigne · May 3, 2017

Ah, I see, the drive hasn't actually failed yet (eg from zpool status). You should be able to physically replace the drive (with a similarly sized good drive) and force a resilver using the instructions in http://doc.freenas.org/9.10/storage.html#replacing-drives-to-grow-a-zfs-pool.

danb35 · May 3, 2017

Mark_the_Red said:
Why is there no Offline button like the manual / post states?

...because you're at the wrong screen. You're at the View Disks screen, and you need to be at the Volume Status screen.
Edit: add screenshot:

Screen Shot 2017-05-03 at 5.25.21 PM.png

Edit 2: ...and the manual (the link Dru posted) does tell you to go to this screen; you don't need to take my word for it. From the text immediately above what she quoted from that page:

Before physically removing the failed device, go to Storage → Volumes → View Volumes. Select the volume’s name. At the bottom of the interface are several icons, one of which is Volume Status. Click the Volume Status icon and locate the failed disk. Then perform these steps:

rs225 · May 3, 2017

That blog and it's bottomish comment are wrong.
If you have a AF/512e(aka 4K) drive, and are using bs=4096, then seek needs to be divided by 8, drop the remainder.

I think it is better though to use bs=512(or omit bs), seek is rounded down to first divisible by 8 if not already divisible by 8, and count=8. You are less likely to make mistake on the seek address that way.

wblock · May 3, 2017

rs225 said:
That blog and it's bottomish comment are wrong.

You're right, an 850M buffer size is just silly. For SATA drives, a buffer size of 128K or 256K is usually more than adequate to keep up with the drive. Use 1M if there is any concern.

But also, growing bad block lists means the drive is failing and should be replaced.

Mark_the_Red · May 3, 2017

danb35 said:
...because you're at the wrong screen. You're at the View Disks screen, and you need to be at the Volume Status screen.
Edit: add screenshot:
View attachment 18187

Edit 2: ...and the manual (the link Dru posted) does tell you to go to this screen; you don't need to take my word for it. From the text immediately above what she quoted from that page:

Well aren't I dumber than a bag of hammers. Thank you. Still need to replace the drive, I think.

Mark_the_Red · May 3, 2017

wblock said:
You're right, an 850M buffer size is just silly. For SATA drives, a buffer size of 128K or 256K is usually more than adequate to keep up with the drive. Use 1M if there is any concern.

But also, growing bad block lists means the drive is failing and should be replaced.

Ok. So the correct syntax for me should be with the following log errors:

Sector Sizes: 512 bytes logical, 4096 bytes physical

Code:

dd if=/dev/zero of=/dev/ada6 bs=512 count=8 seek= 1565538984 conv=noerror,sync

I've ordered a replacement drive, but I wanted to figure this out for my own linux growth. (1565538984 /8 = 195692373 (even division by 8)

SweetAndLow · May 3, 2017

Linux!? Where did you see Linux we must remove it.

Sent from my Nexus 5X using Tapatalk

wblock · May 3, 2017

The options for dd in FreeBSD are not case-sensitive. Linux is, oddly. I don't feel that remapping some new bad blocks is useful. For one, it might overwrite valid data on the drive. A scrub should detect that. The other thing is that when a drive starts growing new bad blocks, it's usually the media going bad, and it doesn't stop.

rs225 · May 5, 2017

Your command looks right to me, except you have an extra space after seek=. dd is dangerous, always check that your command is correct and especially that you have a correct count=X to stop it. After you've overwritten the two spots, try another long test.

If it isn't clear after a few rounds, either wipe the whole drive and/or replace the drive.

The next scrub could have a few checksum errors repaired after this process, although in this case it seems unlikely there is actually any ZFS data on those sectors.

danb35 · May 6, 2017

Mark_the_Red said:
Still need to replace the drive, I think.

Yes, I'd agree. I don't trust the technique of "force remapping the sector by overwriting the bad block"--maybe that's just me being conservative, but SMART test failures call for replacing the drive in my pool.

Important Announcement for the TrueNAS Community.

Replacing a Drive in FreeNAS Z2 - 9.10 FreeNAS Reference Manual Not Clear

Mark_the_Red

Dabbler

dlavigne

Guest

Mark_the_Red

Dabbler

dlavigne

Guest

danb35

Hall of Famer

rs225

Guru

wblock

Documentation Engineer

Mark_the_Red

Dabbler

Mark_the_Red

Dabbler

SweetAndLow

Sweet'NASty

wblock

Documentation Engineer

rs225

Guru

danb35

Hall of Famer

Similar threads

Important Announcement for the TrueNAS Community.

Replacing a Drive in FreeNAS Z2 - 9.10 FreeNAS Reference Manual Not Clear

Dabbler

dlavigne

Guest

Dabbler

dlavigne

Guest

Hall of Famer

Guru

Documentation Engineer

Dabbler

Dabbler

Sweet'NASty

Documentation Engineer

Guru

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replacing a Drive in FreeNAS Z2 - 9.10 FreeNAS Reference Manual Not Clear"

Similar threads