FreeNAS behaviour in case of power failure - some tests

Status
Not open for further replies.

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
Hi,

I built the following system:
- Motherboard: Supermicro X9SCM-f-o
- CPU: Intel Xeon 1220Lv2
- RAM: 2x8Go Kingston ECC KVR16E11/8I
- 6x1TB HDD (in RAIDZ2)
- FreeNAS version 9.2.0
- No encryption
- No compression
- Using CIFS

The system is running fine and before I use it for real (i.e. in production, but only for home usage), I'm experimenting with it to get use to FreeNAS (and trying to read the manual).


From a performance point of view, I'm very happy.
Internally I get about 300MB/s for writes and 400MB/s for reads (I used joeschmuck's tests in his post "Intel NIC vs RealTek NIC - Performance Testing", thanks).
And over the network (gigabit LAN) I get 90-100MB/s on large files (in my case I would have been happy with 60MB/s already!). And with small files I get between 40MB/s (2MB average file size) to 15MB/s (370kB average file size).

The power consumption is also quite good: about 60W when working, 50W idle (HDD not sleeping) and 20-25W for the mainboard and CPU.


As said, I used it first to experiment and get used to FreeNAS.
So I started to think about some tests to perform to see how FreeNAS would react and have an idea about its robustness.

On this forum I read some stories about how it seems easy to loose a pool (and therefore your data), so it got me worrying.
I know that in my setting I don't have an UPS (yet) which is a requirement for FreeNAS and I was wondering about FreeNAS's behaviour when a power failure occurs.
So I did some tests focusing on that issue.

I started with some “basic” tests like:
  • Test 1: System powered off, take out one disk and switch the system on.
  • Test 2: System powered off, take out two disks (RAIDZ2) and switch the system on.
  • Test 3: System powered off, take out two disks, switch the system on (system degraded) delete some data (in this case a 100GB file), system off, reconnect the missing drives and restart the system.
  • Test 4: System powered off, take out one disk, switch the system back on (system degraded), switch off, format the disk and put it back, restart the system.

Those tests were meant for me to see how FreeNAS reacts when a disk fails and what to do. They were quite instructive.

Then I started to perform more tests related to power failure:
  • Test 5: System idle, no access to the volume, pull the plug.
  • Test 6: Read files (i.e. copy from the NAS to a computer), pull the plug.
  • Test 7: Write a big file (i.e. copy from a computer to the NAS), pull the plug.
  • Test 8: Write small files, pull the plug.
  • Test 9: Write files and system shutdown through the interface (I didn’t expect any problem here I just wanted to see the behaviour).
  • Test 10: During a scrub, pull the plug.
  • Test 11: During a resilvering, pull the plug.

Each time, I ran a scrub before and after to make sure everything was fine.
I performed those test more than once (with the exception of test 5 and 9, only once).

Each time the system started up without any problem and I didn’t get errors on the scrubs. I was somewhat surprised! I have a 3.3TB volume and I put about 10% of dummy data in it for the tests. I didn’t want the scrubs to take ages… ;-)
I was even impressed (during test 10 and 11) that it would restart the scrub were it left!

One “basic” test I still want to do is to try with a new USB stick to restart the system (of course, without a backup of the actual configuration).
An other “basic” test I performed was to take the USB stick and the hard disks and mount them on an other board.


What I don't really know is how relevant these tests are?
I only ran them a couple of times which might not be sufficient (but to be sufficient I'd have to run them 10s or 100s of times which is not easy).
What I tried to get out of these tests is a feeling on FreeNAS's behaviour in case of a power failure and to assess the risk of loosing a complete pool.

From the results I got, I would say: well it seems that FreeNAS handles it pretty well even in delicate situations (during resilvering).
But maybe I got lucky with the tests I performed...??

I'd be glad to have your opinion on that.

Some assumptions:
I'm not saying that I do not consider an UPS (I'm just weighting the probability of a power failure with the probability to loose the pool). In fact I'm planning to.
And of course I'm not saying either that I'm not doing backups! ;-)
And I do not know FreeNAS very well, I’m still reading the manual (and this forum which is great).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, so as a 2-3x a day reader of the forums for 2 years, the whole 'people losing pools' thing has bothered me at times. Here's what you really have to keep in mind with this:

1. If you build a FreeNAS box and do exactly what we recommend here.. *actual* server grade parts, ECC RAM, *recommended* SATA/SAS controllers, don't choose settings that are idiotic(for example, dedup with 3GB of RAM), and the like, the chances of pool loss are basically zero.
2. If you choose to build a FreeNAS box and do a bunch of stuff we don't recommend... well, that's your roll of the dice.

To expand on #1, I've read basically every thread that exists on this forum since about April 2012. And I have yet to see someone lose a pool that did everything we recommend. Every single time, without fail, the server owner has made at least 1 serious mistake(and usually they do a whole bunch of them). This is one reason why I harp so much at doing it right in the forums. I get the guard dog/a**hole title for it, but I don't care. If it gets you to not do stupid things you'll be upset about later, then call me one. ;) We had one user last week that was lucky and I got a developer involved. He appeared to have not done anything that I could find that would lead to his pool becoming unmountable, with one exception. He used non-ECC RAM. Now, he tested it at my request and I have a snapshot to validate he did the test. But there's still the chance for bitflips and whatnot that could make this problem impossible to solve with non-ECC RAM.

Also, to expand on #1, I've talked to a senior developer about just this. His answer is that they beat the living hell out of their boxes and basically do everything you aren't supposed to do with your pool... pull the plug with it in, pull the plug with it resilvering, they don't care. It's a test box, so who cares if it fails. They haven't had any failures with ZFS yet.

And lastly, to expand on #1 again, ZFS is designed to rollback incomplete transactions on bootup. So we shouldn't be having the problem we are having. So there's a certain amount of "mysterious errors for no reason" occurring.

Bottom line, do your build right, don't try to circumvent and work around our warnings to get what you want, and things will be okay. Choose to do those things we tell you not to do, and you might want to seriously consider doing offsite backups. There's a reason why I included my hardware list in my noobie presentation. There is also a reason why I offer consulting services. Many people want piece of mind that what they build will work. They also would rather someone else do a parts list so they don't have to deal with all of this stuff. Shopping for FreeBSD is a totally different ballgame than shopping for a Windows machine.
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
Hi Pitfrr

Its good to test your system so you get more and more familiar what you have, its good thing practice hdd failure and how you change "bad hdd" to good hdd, and you can do it right when real time comes.

I did a google search zfs power failure (that was a mistake) and after one week reading thousands of sites, some have lost 150Tb pool or over xxx Tb pools for a power failure, i asked myself what im doing wrong by not losing data.

I do everything "wrong" what freenas stickies tells to do. I use amd, 4Gb non ecc memory, no usb cable on ups and i have had 5 times power loss and have not lose data yet "knocks on wood table" im guessin that i got firstimer luck. In my eyes i really trust ZFS, hoping that day newer comes when i dont trust ZFS. Last time tried XFS and it failed my hdd's in 6 hours (forgot one tiny command on it).
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
Thanks for the feedback.

You seem to be quite lucky! Well at least you know you're doing everything "wrong" to what the stickies say.
I hope I won't discover one day that I did something completely wrong without knowing it. :tongue: Well, reading the manual (and this forum) should prevent that but still it could happen.

@cyberjock: I definitely try to follow #1. ;-) Actually when I started to look for some hardware for FreeNAS, I was about to go for non-ECC RAM and consumer grade motherboard but after reading in this forum I changed my mind, that's a good thing!
Nevertheless I have the opportunity to experiment with the FreeNAS setting I have and I use it as a learning phase as well, getting familiar with FreeNAS and so.
 

trionic

Explorer
Joined
May 1, 2014
Messages
98
Nice post and useful information. I was developing rising paranoia regarding ZFS and power failures. Seems to me that the risk of ZFS pool losses are very low but the consequences of one would be catastropic: total data loss. Hence why I am now adding a UPS to my shopping list!

However, cyberjock's thread does give me some confidence that selecting the correct components will as he says reduce the risk to basically zero.

Still gonna get that UPS and off-site backup though :) Wonder how long it'll take to upload 25TB to CrashPlan? :D
 
Last edited:

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
To be on the safe side regarding power failures, I'm actually thinking about a "home made" UPS solution using a PicoPSU and PicoUPS.
I haven't had time to try it out yet (I'm still messing around with FreeNAS prior putting it in production) but I hope I'll be able soon to have a look into it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And how exactly do you plan to put that PicoUPS into production when it has no connection to the UPS services on your server? Hint: This is your hint that the PicoUPS won't work for this application...
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
Well, I had something you might call "dirty" in mind... :smile: I didn't try it out yet as I said but I plan too to see if this is a viable solution.
I thought pinging the adsl modem every minute or so and if it is not responding then shutdown the system.
Based on the results I'll get, I'll see if I go for an UPS or if I continue with the Picoxxx solution (the motivation here is merely to have fun in building it).

I know there is a somewhat "cleaner" solution but not working on FreeNAS (using /etc/network/interfaces and adding a command for post-down). I haven't found a work around for a FreeBSD systems. And since I'm not very familiar with unix systems for now I'll go for the polling on the adsl modem...
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Well, that's a creative solution at least! ;)

Thanks for the original tests, I wasn't that thorough when I first set up my box (as you also can see from my hardware specs..).
 
Status
Not open for further replies.
Top