Help identifying server status

Status
Not open for further replies.
Joined
Nov 2, 2016
Messages
3
Hi all!

I work at a small production house based in California and we have a fairly sizable FreeNas server setup for everyone to work off of. I'm a relatively new hire, and I was recently given all of the login info to take a look at it because there has been no on maintaining it for the last several months or so. I've never dabbled in FreeNas, but I have spent a fair amount of time setting up/maintaining Ubuntu web/email/DNS servers, and I have a fairly decent knowledge of storage setups/RAID/formatting.

So after I was given all of the info, I poked around to see what the status of everything was, and I was greeted by a nice red flashing warning light for a status. I'll post all of the relevant information below, but my main question is, in it's current state, how safe is this array/setup? Is there any fault tolerance left, or if a drive fails are we out the whole pool? I've already recommended backing the whole thing up, wiping it, testing all of the drives, and get it back into a known good working state, but how mission critical is it at this point?

The chassis is a 24-drive setup, each bay containing a WD 4TB Enterprise drive. I was told that it was originally setup with 3 RAIDZ1, pooled together, with the remaining 3 drives added as hot spares. I first logged in to find it resilvering (not entirely sure why) RAIDZ1-2, and it having 9 drives now instead of the original 7. There are two drives not in use/showing. Drive 9 shows up, but isn't in use, and is available to be added to a pool, and then drive 12 is missing entirely. I'm not sure if it's dead, but I cannot find it anywhere. I'm afraid to pull it because I don't want to risk anything.

Below are screenshots of everything that I believe is relevant to our setup
Screen Shot 2016-11-02 at 9.05.57 AM.png
Screen Shot 2016-11-02 at 9.06.10 AM.png
Screen Shot 2016-11-02 at 9.06.27 AM.png


When "zpool status -v" is run this is the output:
Screen Shot 2016-11-02 at 9.23.18 AM.png
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Oh dear, that server is in very poor shape.
  • RAIDZ1 in this kind of scenario is very irresponsible
  • Previous disks were not replaced correctly or something of the sort (daX designators instead of gptids)
  • The pool has corrupted data
  • The current resilver seems to be doing weird things

Basically, you'll have to rebuild this. Hope you have good backups.
 
Joined
Nov 2, 2016
Messages
3
I'm definitely going to look into RAIDZ2/3 for the rebuild. What might you recommend for a configuration with the current number/size of drives?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I've already recommended backing the whole thing up, wiping it, testing all of the drives, and get it back into a known good working state,
That's good advice, especially with the metadata errors that are showing up. Hopefully you already have backups, and if so, I'd make sure they're walled off from whatever backup you run at this point (i.e., make sure the backup you run today doesn't overwrite the one you had from last week/month).
but how mission critical is it at this point?
If the wrong drive fails (i.e., one of the drives that's in the raidz1-1 set), all your data will go away. That's pretty mission-critical, IMO. You can survive a failure in raidz1-0 or raidz1-2 at the moment though.
What might you recommend for a configuration with the current number/size of drives?
With 24 disks, off the cuff, I'd recommend three, eight-disk RAIDZ2 sets. What's the server being used for? That might affect the pool layout suggestion.
 
Joined
Nov 2, 2016
Messages
3
What would be our best route to backup the data on there currently knowing that we'll be utilizing a new RAIDZ scheme with a lower amount of total storage? The setup as is is fairly full so I don't know if the standard backup configuration+data to another server (which we're looking to rent short term) would work knowing our available storage will go down
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
You said you had 24 disks, 21 active in RAIDZ1 vdevs and three spares. I'm suggesting having all 24 disks active in RAIDZ2 vdevs, with no spares--at least, no hot spares (one or two spare disks would likely be a good idea--burn them in and test them thoroughly, then put them aside for when a disk fails). Storage capacity should be identical. Though if you're more than about 80% full, you really need to add capacity or reduce contents anyway.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Though if you're more than about 80% full
96% full. It's absolutely critical.

Basically, OP, you're going to need more drives or bigger drives. That's even if you were to stay with RAIDZ1, which I'd advise against.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Replace one of the vdevs with 6TB or 8TB drives. That will get you some more space and you'll have a few 4TB drives as spares.

Hot spares are a waste, the same number of drives could've been in 3 vdevs of 8 way z2.

Raidz1 was irresponsible in this scenario

Long term. It may be worthwhile thinking about adding another 24 bay chassis to extend the first one. Sounds like your data requirements are only going to grow

You need to find 72TB of storage to backup the contents while you repair the pool!

Maybe that extra chassis is a good idea?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I wonder if SMART tests are correctly configured too because I see a lot of checksum errors and no SMART alert.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It's fairly clear to me that the whole thing is best done from scratch. The level of trust in the current configuration is close to zero.
 
Status
Not open for further replies.
Top