Degraded zpool, slow performace

Status
Not open for further replies.

Tenek

Explorer
Joined
Apr 14, 2014
Messages
97
Hi all,

A while ago raidz2 array got degraded on my old FreeNas box. I'm preparing another freenas box and almost ready to copy over data. But, recently performance it that pool dramatically decreased. A few bytes/second. It makes me worry that I will not be able to copy over all data from there. Any ideas how I can save my data?

Code:
[root@freenas ~]# zpool status Media                                                                                               
  pool: Media                                                                                                                      
state: DEGRADED                                                                                                                   
status: One or more devices could not be opened.  Sufficient replicas exist for                                                    
        the pool to continue functioning in a degraded state.                                                                      
action: Attach the missing device and online it using 'zpool online'.                                                              
   see: http://illumos.org/msg/ZFS-8000-2Q                                                                                         
  scan: scrub repaired 28K in 89h8m with 0 errors on Wed Aug 19 17:08:58 2015                                                      
config:                                                                                                                            
                                                                                                                                   
        NAME                                            STATE     READ WRITE CKSUM                                                 
        Media                                           DEGRADED     0     0     0                                                 
          raidz2-0                                      DEGRADED     0     0     0                                                 
            2214284285201895418                         UNAVAIL      0     0     0  was /dev/gptid/4e08fc59-a994-11e3-a3e2-000feaf34
845                                                                                                                                
            gptid/4e932358-a994-11e3-a3e2-000feaf34845  ONLINE       0     0     0                                                 
            gptid/4f62d402-a994-11e3-a3e2-000feaf34845  ONLINE       0     0     0                                                 
            gptid/503fcbd4-a994-11e3-a3e2-000feaf34845  ONLINE       0     0     0                                                 
            gptid/50d57663-a994-11e3-a3e2-000feaf34845  ONLINE       0     0     0                                                 
            gptid/51cc1588-a994-11e3-a3e2-000feaf34845  ONLINE       0     0     0                                                 
            gptid/5278d346-a994-11e3-a3e2-000feaf34845  ONLINE       0     0     0                                                 
                                                                                                                                   
errors: No known data errors                                                                                                       
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
replace the drive?

is it just write performance or is read performance affected also?
 

Tenek

Explorer
Joined
Apr 14, 2014
Messages
97
It is read performance. I don't really care about write, since I just need to copy data from there and forget about it :).
 

JDCynical

Contributor
Joined
Aug 18, 2014
Messages
141
Out of morbid curiosity, what does the output of dmesg show? When I had a drive drop out due to failure, the system still kept trying to access the drive and caused no end of I/O waits and timeouts, which affected the throughput of the system overall.
 

Tenek

Explorer
Joined
Apr 14, 2014
Messages
97
I identified failed drive and replaced it via GUI.
After replace zpool displayed "replacing" for the failed drive.
Good, but it is replacing at 130kb/s. It will take nearly about 10 years to finish at that speed.
Also, it affected the other two pools located in the same FreeNas box. Now read speeds for all pools is about 100-700kb/s. Also, GUI and SSH became laggy.

At this point I would like at least to return things to original state and restore performance of other two pools if possible.
I tried to update vfs.zfs.scrub_delay from 4 to 0 to see if it will make a difference. And I restarted the box after value update (was not sure if it was necessary).

UPDATE: GUI came up after 40 minutes and extremely slow. I noticed that after reboot vfs.zfs.scrub_delay was reset back to "4". Also I noticed that two other drives some reason became removed + data errors? Re-silvering is still very slow:
Code:
  pool: Media
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Feb 27 12:34:26 2016
        1.10G scanned out of 10.6T at 166K/s, (scan is slow, no estimated time)
        148M resilvered, 0.01% done
config:

        NAME                                              STATE     READ WRITE CKSUM
        Media                                             DEGRADED     0     0   229
          raidz2-0                                        DEGRADED     0     0   468
            replacing-0                                   DEGRADED     0     0     0
              2214284285201895418                         UNAVAIL      0     0     0  was /dev/gptid/4e08fc59-a994-11e3-a3e2-000feaf34845
              gptid/786623be-dd91-11e5-a669-000feaf34845  ONLINE       0     0     0  (resilvering)
            gptid/4e932358-a994-11e3-a3e2-000feaf34845    ONLINE       0     0     0
            gptid/4f62d402-a994-11e3-a3e2-000feaf34845    ONLINE       0     0     0
            14148241151476974964                          REMOVED      0     0     0  was /dev/gptid/503fcbd4-a994-11e3-a3e2-000feaf34845
            12317803800638565896                          REMOVED      0     0     0  was /dev/gptid/50d57663-a994-11e3-a3e2-000feaf34845
            gptid/51cc1588-a994-11e3-a3e2-000feaf34845    ONLINE       0     0     0
            gptid/5278d346-a994-11e3-a3e2-000feaf34845    ONLINE       0     0     0

errors: 229 data errors, use '-v' for a list
[root@freenas] ~#


what should I do to restore performance of the other two pools and get data copied from there. To save the failing pool is nice but less critical them copy data from the other two.
 
Last edited:

jde

Explorer
Joined
Aug 1, 2015
Messages
93
Per the forum rules, what's your hardware setup?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I noticed that two other drives some reason became removed + data errors?
Either you dislodged something when you replaced the drive, or you have something else wrong with your server. Besides, a RAIDZ2 pool with 3 failed or missing drives should not work at all.
what should I do to restore performance of the other two pools and get data copied from there
You could try detaching the failing pool, but I would focus on figuring out what's causing so many drives to drop.
 

Tenek

Explorer
Joined
Apr 14, 2014
Messages
97
The pool became unavailable overnight. More drives were removed (no idea why).
But, performance of the other two is recovered. I will copy all data from the other two and will continue investigating the issue.
 

Tenek

Explorer
Joined
Apr 14, 2014
Messages
97
Data from two other pools is copied over!
I also managed to start the problem pool with two HDDs disconnected.
I can see the data and copy it over at 5Mb/s. Which is something. It will about 600 hours of copy time :).
I'm wondering if I can export/import degraded pool to the new hardware and get data copied faster over there? Or it is not recommended?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
What makes you think the problem pool would work any better on different hardware?
 
Status
Not open for further replies.
Top