Scrub frozen

Status
Not open for further replies.

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Yesterday I was unable to write to my volume on my NAS. I went to the GUI and under reporting I found that the activity on one of my 6 drives did not mirror the activity on the other five. It was the same drive that had been reported as having some errors in the past but had been and still is listed as online. And the entire pool is showing status as "healthy".

But since the drive had shown errors in the past and is still under warranty I started a RMA with WD and will have a new drive to swap out soon.

After reading the instructions on how to swap out and re-silver a failed drive it seemed like scrubbing the volume would be a good idea to prepare for the swap. Now the scrub is frozen at 4.63% (Scan: 365G out of 7.70T). And the reporting graph on the failing drive looks nothing like its companions in the volume.

What should I do next? Is it safe to shut down the system with the scrub unfinished? I'm worried that I'm about to lose 7.70T of data.

Thanks
 

Hyperion

Dabbler
Joined
Apr 3, 2014
Messages
44
Work out what drive is the problem, shut down, remove the drive AND REPLACE the drive with a shiny new one.
Dont hot swap.
Should work its self out, may take hours.
 

Hyperion

Dabbler
Joined
Apr 3, 2014
Messages
44
The OS is trying to figure things out.
You have a dead HDD.
Just shut down, replace, and happy dance.
Don’t use an old HDD.
With 7.7 wait 2 days
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Yes. Can you use the GUI or Console to initiate the shutdown?
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Warning!

Some storage pools are doing resilver/scrub, are you really sure you want to proceed with SHUTDOWN?
Cancel
Shutdown
 

Hyperion

Dabbler
Joined
Apr 3, 2014
Messages
44
proceed.
 

Hyperion

Dabbler
Joined
Apr 3, 2014
Messages
44
Its no big deal, not my data in peril.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
"Its no big deal, not my data in peril."

Was that comment really necessary?

It would not shut down completely via the GUI. It started to then went to Shutdown terminated. Key board on NAS would not respond so held power button. System is down.

Any advise on what to do once my new drive arrives next week. Steps to take.

I'm guessing just follow the How To on replacing a failed drive.
 

Hyperion

Dabbler
Joined
Apr 3, 2014
Messages
44
Just pointed out,
I backup my shit.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you know what disk is bad you can offline the disk while the scrub is in progress. Just do it in the WebGUI.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Wish I had known. It was not happy about being shut down while scrubbing. So once I started the shutdown while it was scrubbing I lost the GUI and also the directly attached keyboard. Had to forced shut down via power button.

Seems I may have received some less than expert advice here.
 

Hyperion

Dabbler
Joined
Apr 3, 2014
Messages
44
Joking cyber:)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You should not have any issue with shutting down during the scrub, even they way you did it. You could have terminated the scrub from the GUI or shell but you said your system was frozen. Maybe your meaning of frozen and mine are different. No mater, what's done is done.

If you absolutely know which drive it is by serial number, you should replace it while the system is powered off. When you restart your system you will need to replace the failed drive and then once the resilvering starts, offline the drive no longer connected. That is one way to do it.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Your scrub was frozen (in reality VERY slow), because you had a bad drive. ZFS does give up, eventually...

Please, take the faulty drive to another system, to read its S.M.A.R.T data. That way you would in some way confirm that the disk has indeed failed and not the SATA cable nor the SATA port.

If you do not have a system where you could read S.M.A.R.T data (e.g. you have a laptop), the second best is to disconnect (just power cables) all the drives in your system, but the faulty one. Then start the OS of your choice that would allow you to read failed disk S.M.A.R.T data. Do not use USB that has your FreeNAS installation! However, you can use another USB with a fresh FreeNAS install. You only need to: allow SSH as root, start SSH, login as root using SSH from a terminal emulator. Then post here the results of
Code:
smartctl -A /dev/ada0
(replace ada0 with the actual disk number)
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Thanks so much. These tips are very helpful. New drive from WD will be here in a day or two. And I do know which one to swap. Good thought to test on another system. I have a machine for the task.

System is still powered off. I thought I would have to restart with the bad drive in, then off line it. Warning the system before hand that I was making the swap. But from the replies here it sounds like I can basically "offline an empty slot"? Then shut down and connect the new drive, restart and re-silver. (My system is not hot swapable). Are these the right steps?

Completing the drive swap with no data loss is going to give me lots of satisfaction in choosing FreeNAS and ZFS.

Fingers crossed.

Thanks again.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Since the procedure, I have described above, does not touch the USB with your FreeNAS or your good data disks, you can perform it now. You would learn whether it is truly the disk.

After the test, regardless inside which system the disk is having its S.M.A.R.T. data read, you can reassemble everything to like it had been before.

Reading S.M.A.R.T. information does not touch in any way the data (filesystem) residing on the disk.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Now I can not get the system to reboot. It hangs at:

(ada1):ata3:0:0:0) READ_DMA48. ACB:25 00 00 a0 50 40 5d 01 00 00 00 01
(ada1):ata3:0:0:0) CAM status: ATA status Error
(ada1):ata3:0:0:0) ATA status 51 (DRDY SERV ERR), error: 40(UNC)
(ada1):ata3:0:0:0) RES: 51 40 88 a0 50 5d 5d 01 00 6f 00
(ada1):ata3:0:0:0) Retrying Command

and this repeats until the final line is

(ada1):ata3:0:0:0) Error 5, Retries exhausted

And that is where the system is currently hung and sitting.

This is with the failed drive still in. (Not the new replacement)

What should I do?
 
Status
Not open for further replies.
Top