Help! My FreeNAS has gone south

Status
Not open for further replies.

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,402
The other question I might ask is it possible to add redundancy to a stripe without killing and recreating the pool?
Yes, you can turn each single disk stripe into N-way mirrors.

If I am not mistaken, I could theoretically add another 8 drives as a mirror to the current ones, but is it possible to do the same sort of thing with fewer drives?
I really don't see you running with 8 mirrors, twice the number of drives, when you couldn't be bothered to use any redundancy before. Besides how does this help unless you are going to buy 8 drives at once or not use the array until you bought all 8? What device are you planning on using to mirror a couple of 3TB drives besides an additional couple of 3TB drives.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,402
As far as I can tell, I added that disk in late June. (I thought it was more recently than that.) da1 was added in late August, and seeing as there is 4TB free on the pool, I would suspect that da0 is about (at the very least?) 2/3 filled and da1 has lots more free space than da0.
OK, in that case I'm suggesting the following:
  • Buy an additional 3TB drive.

  • Remove da0 to a separate case with adequate cooling.

  • Using ddrescue clone as much of da0 as it is able to read to the new 3TB drive.
    Doing it this way will be less stressful than trying to mirror da0 when it's in the pool.

  • Place the new drive into the array and copy what data you can off.

  • RMA da0 if under warranty.

It should go without saying that the NAS should remain shutdown until da0 is replaced and the thermal issues addressed.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
It's too bad the OP didn't consult my guide(link in the sig). It would have warned the poster that what he was doing was not in the best interest of his data. :( I hate stories like this one.

To the OP- I highly recommend all of the advice being given in this thread. UPS is virtually mandatory, redundancy if you care about your data is also virtually mandatory. The RPC-4020 and 4024 both worky very well as long as you aren't trying to use high performance drives. A friend has 16 7200 RPM Seagate drives and they idle about 50C+(a bit hot in my opinion). My 22 drives are all Green drives, and although they are in a hotter environment, still run 8-10C cooler.

For my friend, since he has the RPC-24 we even spaced out the drives and left the 1U rackspace above and below the server open. Problem solved!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
For my friend, since he has the RPC-24 we even spaced out the drives and left the 1U rackspace above and below the server open. Problem solved!

Leaving the space above and below open is generally a bad thing. It's dependent on specifics of course, but you should consider the following:

A server chassis is designed to cool front-to-back (usually), and in a stack with other servers. While some heat will of course dissipate up and down, the vast majority of BTU's should be heading out the back. Any BTU's that dissipate to the next server down or up have to be dealt with by that unit's cooling system, and that suggests that other units are heating the unit that we're discussing, so in reality it should all kind of be a wash anyways. All of the units are moving the heat from the front to the back.

If you introduce gaps, you've just introduced a complication. Now you have stagnant air zones; if you are truly pouring BTU's out the top (and bottom) of your server, then you are creating an artificially hot zone above (and below) your server. You're heating it up. The heat sits there. Your server has to get warmer in order to continue dissipating heat in that direction. Good? Probably not.

Now, if you have a stack of crappy servers, all of which are insufficiently cooled, stacking them without gaps will cause them to bake. Gapping them might make them bake somewhat less; the evidence for this is nonconclusive.

However, if you have a stack of decent servers, stacking them without gaps helps to guarantee that the fans on surrounding units will help keep a unit with malfunctioning cooling at a better temperature than it would otherwise; the BTU's don't get to go heat an air pocket inbetween the servers.

So, if you've followed this so far, you're quite possibly saying "but my friend doesn't have any other similar servers, it's just this one." That's fine. It doesn't change the reality of it all. You already figured out that staggering the drives in your server helps them run cool. That's because your server has some airflow, and it is distributed more evenly when you stagger things out. What I'm saying is that this probably isn't true of the gap above and below the server. You really want the air moving there so there's less heat buildup.

So it'd be really much better to go and identify your BTU load and make sure your server cooling design is adequate for that plus a healthy margin, then you can eliminate the EZ-bake oven zones around the server.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I totally agree. But this is a 12U case with 1 server and 1 network switch. We've got as many fans in the case as we can put in, so adding more isn't an option. The rack sits in a basement and is always cool. Because of space constraints, setting a fan near the case blowing on it isn't an option either.

Engineering-wise, I agree that an analysis would find the problem, but this "worked" and is a good long term solution for our situation. There is no intention of ever adding more heat to the rack, adding more hard drives, etc.

Overall, I think the problem is that the case doesn't work very well for high performance drives. Even when the case was sitting on the floor it just didn't work very well for cooling.
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
OK, in that case I'm suggesting the following:
  • Buy an additional 3TB drive.

  • Remove da0 to a separate case with adequate cooling.

  • Using ddrescue clone as much of da0 as it is able to read to the new 3TB drive.
    Doing it this way will be less stressful than trying to mirror da0 when it's in the pool.

  • Place the new drive into the array and copy what data you can off.

  • RMA da0 if under warranty.

It should go without saying that the NAS should remain shutdown until da0 is replaced and the thermal issues addressed.

I agree this is a good plan. I routinely use ddrescue to clone hard drives and it is great. It does not work in all situations, but it is very helpful.

Bob

 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
It's too bad the OP didn't consult my guide(link in the sig). It would have warned the poster that what he was doing was not in the best interest of his data. :( I hate stories like this one.

Reading this prompted me to download both versions of the Guide. I like it! I wonder if it could introduce ZFS with a short history and maybe some links to the documentation? That way we could have a smoother transition between discussing the weaknesses of hard drives for data storage, which leads us to introduce ZFS as a way to deal with these weaknesses, which then transitions to discussing the core features of ZFS and how best to utilize them.

Bob
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I totally agree. But this is a 12U case with 1 server and 1 network switch. We've got as many fans in the case as we can put in, so adding more isn't an option. The rack sits in a basement and is always cool. Because of space constraints, setting a fan near the case blowing on it isn't an option either.

Engineering-wise, I agree that an analysis would find the problem, but this "worked" and is a good long term solution for our situation. There is no intention of ever adding more heat to the rack, adding more hard drives, etc.

Overall, I think the problem is that the case doesn't work very well for high performance drives. Even when the case was sitting on the floor it just didn't work very well for cooling.

You can usually put more fans in, but the question is, at what effort and return. The problem with the 24-drive-in-4U cases, and this isn't limited to Norco, is that the airflow inward is constrained by the design of the front; the drives are necessarily spaced very close together. What that leads to is lots of cooling problems, and absolutely needing to address all the issues properly. For the Norco 4224 case, for example, there are four cabling openings between the two sections that allow for air to backflow; this is very bad, and these need to be blocked, because the front zone will have lower air pressure than the motherboard zone. There are also a whole bunch of other apparent cutouts that at first glance I thought were part of the fan modules, but it appears the fans are screwed to the chassis...? Anyways, I can see why they make the upgrade for that,

http://www.ipcdirect.net/servlet/Detail?no=258

but even so, sealing the penetrations in that fanwall would be pretty mandatory to keeping drives cooler. But I'd also be considering the CFM issues and I'm guessing that adding some more fans to the backplate would help, because with four fans on the midwall and two on the back (plus unknown in the power supply), you might not have sufficient balance to keep those midwall fans from burning out.

As for your specific case, does the rack have sides? If so, seriously consider sticking a thermometer in there for a day or two to see if you have a hot zone above or below the server. If you do, then consider putting a crummy 12 volt wall wart and a cheap case fan in the space to encourage a little airflow. The point is to ensure that you're not artificially raising the server's temperature (and lowering the life expectancy of the drives) with avoidable heat... but you might be getting lucky if the basement's cool.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The fans are proprietary and part of the plate. The link you provided is the "upgraded" 120mm backplate. We did opt to purchase the plate and put high CFM fans in that are also known to be very reliable. The sides are completely welded, but there is very little space between the rails and the side of the rack. The rack, over all sides, is cool to the touch. The rack is 1 entire welded assembly and only requires you to mount the front door and wheels. I don't think there is a hotzone hiding somewhere. I think the only problem is that there isn't enough airflow through the very dense array of drives. I think that the airflow is solely to blame.

I never tried using the default backplate so I'm not sure if it would have worked better. I just don't like 4 smaller fans and chose to go straight to 3 fans for both mine as well as my friends setup. We also both have one of those PCI cooling slot fans that pulls air from the neighboring PCI cards and blows it out the back of the case. We chose to add the extra fan because it couldn't hurt and we didn't want to cook the RAID controllers.

I don't remember the exact model of hard drives that he has, but when I saw the wattage on the drives I was in shock. His total system load idle is something like 190w and during a scrub almost 300w. We even used lower power components (i3) and still have rather high power draw :(. We calculated out the power and the hard drives are really what's killing him. The system without any hard drives was 49w.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Reading this prompted me to download both versions of the Guide. I like it! I wonder if it could introduce ZFS with a short history and maybe some links to the documentation? That way we could have a smoother transition between discussing the weaknesses of hard drives for data storage, which leads us to introduce ZFS as a way to deal with these weaknesses, which then transitions to discussing the core features of ZFS and how best to utilize them.

Bob

That's pretty far beyond the intent of my guide. I only created the guide because it seems like a large number of people make the exact same mistakes. So I thought I could hopefully prevent every 5th thread from being the same question that was just asked the previous day and perhaps save some people from losing data. But it still fails because too many people don't see the guide, choose not to read the guide, or think they know everything necessary to have a successful build. Then they come crawling to the forum for help and fairly regularly the answer is to their SNAFU is in the guide. I've never seen anyone post that they didn't understand the guide. I'd like to keep it simple because if I full it up with links and other stuff people will immediately dismiss it as giving them alot of useless garbage.

At the end of the day I like the history of ZFS. But it is not really necessary for an admin to know the history to use ZFS effectively. The history will help someone decide if ZFS is for them. But if they're already choosing to use ZFS then we're already past that point.
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
If one person takes your Guide to heart, that is a massive win for you and a massive win for the other person. One person! You help spread adoption of your message through friendly, non-negative guidance that keeps improving over time.

I've never met instant experts at anything. We are all "newbies". This includes me. It takes a lot of training and a lot of mistakes and a lot of try, try, try again to gain experience and ultimately achieve expertise. Your Guide can really help with this. It needs improvement. Repeating the message and extending it is quite important to reaching out to an audience and winning more converts. In real life, people need repetition of a message in order to understand and implement the content in some way.

A short introduction to ZFS is important to connect "hard drives" with "data storage" concepts. It doesn't need to be a 500 page book. A paragraph or two and some links are great. Then the Guide could benefit from a section on backing up an NAS.

You could turn this into a win-win for everyone: you, and all the people who happen to read your posts. Some of these could be future employers or people you never thought you would do business with.

Bob
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
I think that giving people proactive, helpful, non-negative advice is the best way to go. The people who post here, like Brian, come for help from the more experienced folks. We have not met these people and don't know them. We only know what they describe in the posts they make. We don't know anything about their goals, their purposes, their backgrounds, or personalities. We don't know anything about their finances or their lifestyles or beliefs. What we do know is they came here for help. The problem they are having is serious enough to justify writing a post to these forums. Teaching is all about guiding these people, without negativeness.

Many long years ago as a little boy I attended a parochial school. That was back when the nuns who did the teaching felt quite free to beat their students, beginning with sharp raps on the knuckles with rulers and getting worse from there. There are two incidents I vividly remember. One is from the equivalent of kindergarten. A nun asked the little boy sitting next to me, "what color is your sweater?" He was scared and didn't answer. "What color is your sweater?" the nun asks again. No response. "What color is your sweater?" This time the nun screams at him. He doesn't respond, and is close to tears. "YOU ARE STUPID!" screams the nun, at the top of her voice, very angry with him.

I can't say that sort of treatment helped the little boy.

Bob
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The fans are proprietary and part of the plate. The link you provided is the "upgraded" 120mm backplate. We did opt to purchase the plate and put high CFM fans in that are also known to be very reliable. The sides are completely welded, but there is very little space between the rails and the side of the rack. The rack, over all sides, is cool to the touch. The rack is 1 entire welded assembly and only requires you to mount the front door and wheels. I don't think there is a hotzone hiding somewhere. I think the only problem is that there isn't enough airflow through the very dense array of drives. I think that the airflow is solely to blame.

I never tried using the default backplate so I'm not sure if it would have worked better. I just don't like 4 smaller fans and chose to go straight to 3 fans for both mine as well as my friends setup. We also both have one of those PCI cooling slot fans that pulls air from the neighboring PCI cards and blows it out the back of the case. We chose to add the extra fan because it couldn't hurt and we didn't want to cook the RAID controllers.

I don't remember the exact model of hard drives that he has, but when I saw the wattage on the drives I was in shock. His total system load idle is something like 190w and during a scrub almost 300w. We even used lower power components (i3) and still have rather high power draw :(. We calculated out the power and the hard drives are really what's killing him. The system without any hard drives was 49w.

Proprietary fans? What a crock, especially in this age of 4-wire fans and stuff... The airflow issue is, as previously noted, common to these 24-drive chassis systems. Unfortunately. I have yet to find a 24-drive chassis I really like; we sold the T-Win RMC4E-XP for a while, before Supermicro and other vendors came out with slightly better stuff. Ironically, you're probably better off with a 24-drive version rather than the 20-drive version, because at least there, the airflow will be evenly sucky; you won't have a wind tunnel effect through vents on that unused front area.

The AT slot cooling fans are generally poor quality; expect them to fail over time. A vented plate and proper airflow from the system fans is usually just as effective, for less power.

190 watts idle? That's nothing. That's amazingly low. Back in the day, idle was maybe 500 watts. ;-) But yeah, there's a lesson to be learned, watts add up ... quickly. Drives and fans are some of the things you want to stare at hardest before committing to large numbers.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The AT slot cooling fans are generally poor quality; expect them to fail over time. A vented plate and proper airflow from the system fans is usually just as effective, for less power.

We know they suck(pun not intended). We just didn't want to find out the hard way how hot a RAID controller in IT mode would get before it burneded out. For $10 it was one of those things where it was better than nothing.
 

donairb

Dabbler
Joined
Jan 5, 2012
Messages
19
OK, in that case I'm suggesting the following:
  • Buy an additional 3TB drive.

  • Remove da0 to a separate case with adequate cooling.

  • Using ddrescue clone as much of da0 as it is able to read to the new 3TB drive.
    Doing it this way will be less stressful than trying to mirror da0 when it's in the pool.

  • Place the new drive into the array and copy what data you can off.

  • RMA da0 if under warranty.

It should go without saying that the NAS should remain shutdown until da0 is replaced and the thermal issues addressed.
OK. Thank you for the help. It will be a few days before I get to this, but before I do, I just want to clarify something:

In step 4 (place new drive into array...), can I just power up FreeNAS with the new drive where the old drive used to be and it should work? Or do I need to do any configuring to make it so?

Thanks

Brian
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,402
In step 4 (place new drive into array...), can I just power up FreeNAS with the new drive where the old drive used to be and it should work?
Assuming you are able to clone what's left of the old drive, yes it should just work.
 

donairb

Dabbler
Joined
Jan 5, 2012
Messages
19
Just to let you know, I have fixed the situation with minimal (5MB) loss of data. Of course, it helped that somebody who owed me a substantial sum of money came through.

I now have everything in a Norco RPC-4020. Everything is running very cool. My volume now consists of 2 vdevs of 6 3TB drives each in RaidZ, so I can have at most 1 drive in either (or each) vdev fail with no loss of data.

I will also be upgrading to 32GB RAM in a few days just to be on the safe side.

Thank you for all of the help here.

Brian
 
Status
Not open for further replies.
Top