Resilvering two drives on RAIDZ2

Status
Not open for further replies.

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
I shut down my FreeNAS box last night to relocate it to my rack. I got six repeating beeps on startup which suggested that it was the keyboard unplugged error. I plugged in a keyboard and cycled power. The box came up and everything looked ok. However overnight I got some emails about drives resilvering. I logged into the box this morning and got the error below.

294 hours is 12 days, give or take. What happens if another drive takes a dump while these drives are resilvering?

The server has been solid in this configuration (details in signature) for a couple of years now. I have an M1015 controller and I'm wondering if the second breakout cable worked itself loose when I moved the server. At this point I'm afraid to touch the thing. Thoughts?

Code:
[root@freenas] ~# zpool status pool
  pool: pool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Feb  6 07:29:49 2017
        28.3G scanned out of 8.01T at 7.89M/s, 294h34m to go
        9.42G resilvered, 0.34% done
config:

    NAME                                            STATE     READ WRITE CKSUM
    pool                                            ONLINE       0     0     0
     raidz2-0                                      ONLINE       0     0     0
       gptid/b3bfc440-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/b46f2b33-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/b5210424-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/b5d788fd-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/2ae3a80c-de34-11e5-a364-0015175d4010  ONLINE       0     0     0  (resilvering)
       gptid/addd78e8-e01e-11e5-821e-0015175d4010  ONLINE       0     0     0  (resilvering)

errors: No known data errors

 
Joined
Feb 2, 2016
Messages
574
What happens if another drive takes a dump while these drives are resilvering?

If you lose another drive, you lose your data. Pretty please, tell me you have a backup and keep your fingers crossed. I wouldn't touch anything until you've got a backup and your disks are happy again.

If the drives are resilvering, everything is plugged in correctly.

It isn't uncommon for drives that have been powered for years to fail after reaching room temperature and being spun back up. That's why best practice is to (a) never turn off drives or (b) regularly cycle your equipment and find problems inside your maintenance window.

Cheers,
Matt
 
Last edited:

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
If you lose another drive, you lose your data. Pretty please, tell me you have a backup and keep your fingers crossed. I wouldn't touch anything until you've got a backup and your disks are happy again.

If the drives are resilvering, everything is plugged in correctly.

It isn't uncommon for drives that have been powered for years to fail after reaching room temperature and being spun back up. That's why best practice is to (a) never turn off drives or (b) regularly cycle your equipment and find problems inside your maintenance window.

Cheers,
Matt

Somehow the estimated resilver time of 294 hours was off by 292 hours.

Yes, I have multiple copies of the data. Most of it is backups, or VMware lab stuff that I can re-create, or other stuff that I have duplicates of. It's just more the pain-in-the-a** factor of having to re-do everything.

Code:
[root@freenas] ~# zpool status pool
  pool: pool
state: ONLINE
  scan: resilvered 34.5G in 1h51m with 0 errors on Mon Feb  6 09:21:31 2017
config:

    NAME                                            STATE     READ WRITE CKSUM
    pool                                            ONLINE       0     0     0
     raidz2-0                                      ONLINE       0     0     0
       gptid/b3bfc440-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/b46f2b33-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/b5210424-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/b5d788fd-cd82-11e2-b330-002590ae2bed  ONLINE       0     0     0
       gptid/2ae3a80c-de34-11e5-a364-0015175d4010  ONLINE       0     0     0
       gptid/addd78e8-e01e-11e5-821e-0015175d4010  ONLINE       0     0     0

errors: No known data errors
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
A resilver always starts off slower and then ramps up. I never look at that time indicator until it has been running for at least 15 minutes then it becomes a bit more accurate. Glad everything turned out fine.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
One new feature in OpenZFS, (if I understand it correctly), will be sequential re-silvers.
Meaning after 2 failures like you had, it would re-silver one disk until completion. Then
start the second. Should be faster than dual re-silvers...

Of course @nojohnny101 is right about your timing. I start with checking the read speed.
If it's way too slow, then I wait before looking at the time to complete.

As for when the sequential re-silver feature will show up, probably at least 6 months. And
possibly 1 year.
 
Last edited:
Joined
Apr 9, 2015
Messages
1,258
I have to ask a question here to be sure. You powered off then powered on and it resilvered? No drive replacement at all. If that is the case the pool repaired itself but I would still run a scrub to double check things and keep an eye on the smart data.

Would also love to hear some input from someone like @dlavigne or @jgreco or maybe @jkh if this is the case.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
One new feature in OpenZFS, (if I understand it correctly), will be sequential re-silvers.
Meaning after 2 failures like you had, it would re-silver one disk until completion. Then
start the second. Should be faster than dual re-silvers...

'Sequential' refers to the order in which blocks are read/written, not whether it is resilvering two drives at one time.

It does this magic by cheating ('nearly sequential'); it reads huge chunks of metadata into RAM, sorts it by location(aka elevator seek), and resilvers that portion. Repeat until 100%. If interrupted, you lose up to 1/hr of progress since that is how often it syncs up progress with the original resilver method metadata.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
'Sequential' refers to the order in which blocks are read/written, not whether it is resilvering two drives at one time.

It does this magic by cheating ('nearly sequential'); it reads huge chunks of metadata into RAM, sorts it by location(aka elevator seek), and resilvers that portion. Repeat until 100%. If interrupted, you lose up to 1/hr of progress since that is how often it syncs up progress with the original resilver method metadata.
Actually there are 2 re-silver improvements in OpenZFS coming from the head stream. The one I mention,
one disk at a time, and the one you mention. (If I understand what was said on the OpenZFS site correctly.)
 

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
Somehow the estimated resilver time of 294 hours was off by 292 hours.

Quick note on this one. A day or so after the resilver completed the two drives went offline again. I thought it was interesting that the two drives connected to the second mini-SAS cable on my m1015 HBA were going offline at the same time. So I did a shutdown and checked the connections, making sure that every cable end was firmly seated. Since then it's been like nothing ever happened.
 

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
Since then it's been like nothing ever happened.
Clear sailing for a few months, then the same simultaneous error on the same two drives connected to the second break-out cable. Ordered two 3ware SFF-8087 breakout cables to replace the Monoprice cables I had been using. It's been about a month with no issues.

If it happens again I will replace the m1015 HBA.
 
Status
Not open for further replies.
Top