Supermicro FreeNAS

darkwarrior · Dec 10, 2016

artlessknave said:
https://forums.FreeNAS.org/index.ph...d-why-we-use-mirrors-for-block-storage.44068/
http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/
https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz

these are part of why i chose mirrors instead of raidz

Hi,

I'm with you on that one.
Much better IO performance and easier pool expansion

Chris Moore · Dec 12, 2016

artlessknave said:
https://forums.FreeNAS.org/index.ph...d-why-we-use-mirrors-for-block-storage.44068/
http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/
https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz

these are part of why i chose mirrors instead of raidz

and

darkwarrior said:
Hi,

I'm with you on that one.
Much better IO performance and easier pool expansion

You guys do what you feel you need to do. If you are running a database or a bunch of VMs, there might be a usage where you can get higher performance from mirrors and I don't want you to think that I am arguing, but here is some food for though.

I have a pool that is a mirror for running the couple VMs that I run, so I do it, just not with my data.

My NAS serves the primary purpose of being a network file share. Everyone on my network stores all their files on the NAS. It is already faster than the network, so having more IO is irrelevant.
That leaves only one need, not speed, security of the data. I don't want a disk fault to take my storage down. I don't want to worry.

I had a disk fail on Friday (9 December 2016) and my NAS was helpful enough to e-mail me at 10:01 AM (while I am at work 60 miles away) just as follows:

Code:

Device: /dev/da11 [SAT], 8 Currently unreadable (pending) sectors
Device: /dev/da11 [SAT], 8 Offline uncorrectable sectors
Device: /dev/da11 [SAT], Self-Test Log error count increased from 0 to 1

Now, I can't leave work to go check on it or deal with it in any way until I get home, so, if my pool were made of mirrors, this would worry me. Because I am running RAID-z2, I have a second level of redundancy and I know that I don't have to worry because I can just replace that disk when I get home... It's all better now and I never broke a sweat.

Disks can, and do, fail without warning. If you are running a mirror, you can only survive 1 failure. Unless you have been especially vigilant in your purchasing of drives, it is likely that the disks in your mirror are the same age and even from the same batch which means they stand a decent chance of failing about the same time. The theory being that when one disk of the mirror fails, the other disk in the mirror could fail before you can replace the first or while the rebuild is in progress.

There is also the idea that mirrors rebuild faster. That would only be true if you were limited by read speed. I just don't see that in my pool, the limit is write speed. The speed of the rebuild is dependent on the amount of data that must be written to the replacement disk. With 700GB of data to transfer onto the drive, it doesn't matter if that drive is in a RAID-z2 or if it is in a mirror, 700GB is still 700GB and it takes time to transfer that much data onto the drive.

As for expanding the pool by adding drives, that is not a concern because my plan was to replace the existing 2TB drives with larger drives. The pool will automatically expand when I have replaced all the drives in a given vdev. So, I need to replace 6 x 2TB drives with 4TB drives to add approximately 7 TB of storage to my pool. That will cost me about $600 if I wanted to do it today, or about as much as buying a pair of 8TB drives, but I have enough storage already online that I don't need to upgrade for a couple years and by that time the cost of 4TB drives will be even less. It is all about planning ahead instead of reacting in the moment.

darkwarrior · Dec 12, 2016

Chris Moore said:
and
[...snip ...]
Now, I can't leave work to go check on it or deal with it in any way until I get home, so, if my pool were made of mirrors, this would worry me. Because I am running RAID-z2, I have a second level of redundancy and I know that I don't have to worry because I can just replace that disk when I get home... It's all better now and I never broke a sweat.
[...snip...]

There is also the idea that mirrors rebuild faster. That would only be true if you were limited by read speed. I just don't see that in my pool, the limit is write speed. The speed of the rebuild is dependent on the amount of data that must be written to the replacement disk. With 700GB of data to transfer onto the drive, it doesn't matter if that drive is in a RAID-z2 or if it is in a mirror, 700GB is still 700GB and it takes time to transfer that much data onto the drive.

[... snip...]

As we would say everybody is making his own risk assessment on this matter. And I'm fine with running mirrors for my particular usage being iSCSI ;)

But in your statement there is something you forgot to mention:
When rebuilding a RAIDZ2 VDEV all the disks inside of the same VDEV are stressed during the resilvering operation.
Additionally, there are much more many parity calculations to be done.

Whereas you are stressing only one (or potentially 2, if youre having a "tripple mirror") disk during a resilver in a mirrored VDEV and you don't need for any parity data recalculation to complete.

artlessknave · Dec 12, 2016

ya, your setup of 2x raidz2 matches the recommendations and sounds pretty reliable.

my point, which I didn't really state, was more that saying mirrors was the cause of slow speeds wasn't necessarily true, depending on a ton of factors.

on a related note, hardware arrived (proc/mobo/RAM), it reads the controller card, and disks *actually show up* (which is a ******* huge improvement...). zero problems with RAM or anything, plugged it all in and it worked after a reboot (there was something about the mps driver and disks conflict but that all went away with a reboot so idk)

also, IPMI is gorgeous.

Chris Moore · May 15, 2018

artlessknave said:
also, IPMI is gorgeous.

I actually upgraded from a socket 1366 Xeon server to a socket 1155 Xeon just to get the IPMI. There were some other improvements, but the key feature was the IPMI.

rvassar · May 15, 2018

Chris Moore said:
and

I had a disk fail on Friday (9 December 2016) and my NAS was helpful enough to e-mail me at 10:01 AM (while I am at work 60 miles away) just as follows:

Code:
Device: /dev/da11 [SAT], 8 Currently unreadable (pending) sectors Device: /dev/da11 [SAT], 8 Offline uncorrectable sectors Device: /dev/da11 [SAT], Self-Test Log error count increased from 0 to 1

Now, I can't leave work to go check on it or deal with it in any way until I get home, so, if my pool were made of mirrors, this would worry me. Because I am running RAID-z2, I have a second level of redundancy and I know that I don't have to worry because I can just replace that disk when I get home... It's all better now and I never broke a sweat.

Disks can, and do, fail without warning. If you are running a mirror, you can only survive 1 failure. Unless you have been especially vigilant in your purchasing of drives, it is likely that the disks in your mirror are the same age and even from the same batch which means they stand a decent chance of failing about the same time. The theory being that when one disk of the mirror fails, the other disk in the mirror could fail before you can replace the first or while the rebuild is in progress.

This actually brings up a really good point, that I haven't had a chance to poke at yet with FreeNAS. Way back in the day (2006-ish for ZFS purposes) I was an actual Sun employee. We configured our ZFS pools in Solaris 10 with hot-spares like this:

Code:

# zpool create test mirror c0t0d0 c0t1d0 spare c0t2d0
# zpool status test
pool: test
state: ONLINE
scrub: none requested
config:
NAME		STATE	 READ WRITE CKSUM
test		ONLINE	   0	 0	 0
mirror	ONLINE	   0	 0	 0
c0t0d0  ONLINE	   0	 0	 0
c0t1d0  ONLINE	   0	 0	 0
spares
c0t2d0	AVAIL
errors: No known data errors

The zpool command appears to support this, but I have yet to notice that functionality exposed via the GUI. Did I just miss it?

I have a 4x4Tb pool. If I buy a 4Tb USB3.0 drive, and keep it parked on the USB HBA... What if anything will happen if I run:

Code:

# zpool add <pool> spare <disk>

Thoughts? That at least should get me resilvering while I'm at work. I obviously don't want to stay like that, but... Any port in a storm as they say.

Stux · May 15, 2018

You can add spares/log/cache drives through the volume manager in the GUI.

rvassar · May 15, 2018

Stux said:
You can add spares/log/cache drives through the volume manager in the GUI.

I will have to take this for a spin! Or more to precisely, I'll have to see if I can keep a spun down disk (power saving, but more for the flying head hours...) as a hot spare.

Chris Moore · May 15, 2018

rvassar said:
I have a 4x4Tb pool. If I buy a 4Tb USB3.0 drive, and keep it parked on the USB HBA... What if anything will happen if I run:

I would not try to add a USB drive to your pool. USB is not a reliable drive interface. It is not and HBA.

Chris Moore · May 15, 2018

rvassar said:
This actually brings up a really good point, that I haven't had a chance to poke at yet with FreeNAS. Way back in the day (2006-ish for ZFS purposes) I was an actual Sun employee. We configured our ZFS pools in Solaris 10 with hot-spares like this:

When you open the volume manager tab, you click the "Manual Setup" button.
Looks like this:

Then this:

rvassar · May 16, 2018

Chris Moore said:
I would not try to add a USB drive to your pool. USB is not a reliable drive interface. It is not and HBA.

Agreed, I stand corrected, it would be a host controller interface (HCI), or in my case an xHCI.

However... (and at risk of highjacking this thread...) I'm referring to a constrained use case. A mechanism to attach a hotspare as a drive of last resort when all the real HBA ports are all filled. This would not be intended to operate for any extended period of time, just be there to catch the pool when it hits a failure, with the intention of replacing the failed member in a bounded time frame. The question is will the pool allow the drive to stay mostly powered down as an idle member or not. As a further constrained alternative, you could keep the drive attached and unassigned. At least in your 60 mile away scenario, you could remotely attach it via ssh.

Finally... We have in the docs the suggestion that people use USB thumb drives for the boot devices. Objection to USB attach is somewhat muted by this, as those boot devices are participating in a pool 24x7 via a USB HCI. In my case, I went so far as to mirror my boot devices, and I'm considering placing them on separate HCI's.

rvassar · May 16, 2018

Chris Moore said:
When you open the volume manager tab, you click the "Manual Setup" button.

Guilty of having never even looked there!

Chris Moore · May 16, 2018

I don't use USB for my boot device any more, because, when I did, I found it was not very reliable. Also the boot pool doesn't do a lot of read or write activity so having a boot device on USB is not as big of a problem. What I did do, one time before, was attempt to attach a USB device to import data. The thing that happened to me was, it crashed the entire system. I wouldn't want to trust a USB device as part of my pool because of the possibility of it crashing the entire server and corrupting my storage pool.
I would rather buy another HBA to add more drives. I don't have that problem though because I have a 48 bay chassis with 12 empty bays.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

rvassar · May 17, 2018

Chris Moore said:
What I did do, one time before, was attempt to attach a USB device to import data. The thing that happened to me was, it crashed the entire system. I wouldn't want to trust a USB device as part of my pool because of the possibility of it crashing the entire server and corrupting my storage pool.
I would rather buy another HBA to add more drives. I don't have that problem though because I have a 48 bay chassis with 12 empty bays.

Well, with 36 platters & 12 empty bays, that makes complete sense! But not all of us have that kind of kit, or the need to hedge against that kind of risk. My little $90 NAS has but 4 SATA ports, and for my needs, 10GbE is probably a higher priority than a chassis upgrade. But I also approach that problem as a QA engineer. If attaching a USB drive crashes the kernel, I want to reproduce that and catch it in the kernel debugger. Grab the stack trace and submit a bug... Or better yet, submit a bug with a patch!

But I remain committed to the use of hot spares. It may not make sense for a small NAS like mine, it may not make sense to use a poor attachment mechanism. But hot spares are a very good thing! As a former Storage QA engineer, believe me... If you have 36 spinning platters and no hot spares, you're on borrowed time!

Chris Moore · May 17, 2018

rvassar said:
As a former Storage QA engineer, believe me... If you have 36 spinning platters and no hot spares, you're on borrowed time!

I recognize that you have a background in this, but I do also. Those of us that offer assistance on the forum try to find out what the need of the user is before we make suggestions so that we can make suggestions that are applicable to the scenario that the user has and not simply based on what our personal preferences are. I understand that you like to have a hot-spare, but I would rather not (especially at home) and I will share the reasoning behind that if you are interested, just ask. In a business environment, hot spares can save the day and I have seen that happen personally, but I don't see the need at my house even though I have 36 drives in my system. I have a full, online, backup of everything in my storage pool in a backup pool on the same server and I run a backup server also. It would be inconvenient for sure if the main storage pool failed, but I would need three drives to fully fail in one vdev (not just bad sectors) in the main pool before I would loose it and even then I would still have all my data in the backup pool, which is online and shared in the same server, and I have a backup server with another copy that is synced hourly. The backup server is also shared to the network so if the whole primary server goes down for some reason, the data can still be accessed from the backup server. This is how I build for home. I have a primary server with 36 drives and a backup server with another 8 drives. I just don't think I need to go to the additional step of having hot spares. I think the odds of having three pools fail simultaneously are astronomical.

rvassar said:
My little $90 NAS has but 4 SATA ports, and for my needs, 10GbE is probably a higher priority than a chassis upgrade.

Not sure what your point is here, but you can't utilize the network speed of 10Gb Ethernet only having 4 dives. The drives will not be fast enough to feed the network connection.

rvassar said:
If attaching a USB drive crashes the kernel, I want to reproduce that and catch it in the kernel debugger. Grab the stack trace and submit a bug... Or better yet, submit a bug with a patch!

Please do, I would love to have reliable USB connectivity. To be completely honest, there have been two new versions of BSD since I tried it, so the problem could very well be fixed. Still, I have a hard time trusting that it will work reliably.

rvassar said:
But I remain committed to the use of hot spares. It may not make sense for a small NAS like mine, it may not make sense to use a poor attachment mechanism. But hot spares are a very good thing!

I manage a system at work that was built with disk shelves. It has 16 drives in the head and 16 more in each of 4 shelves. It was setup two or three years before I started working here and the original configuration was for each set of 16 drives to be a 15 drive RAIDz2 vdev with the 16th drive in the shelf being a hot spare. So 5 hot spares in an 80 drive system. When I started working here, it had been ignored for a long time because I filled a position that had been vacant for over a year and nobody had been minding the servers. This server had already used all the hot spares because of drive failures and if one more drive had failed in the second vdev, the whole pool would have been lost. I recognize that hot spares can be the thing that saves a system, however, if a system is monitored it should never get to that point.

I check my server health every day and I have email alerts setup, so I will be notified of a drive fault immediately and I replace drives at the first reallocated sector. I am absolutely paranoid about any fault causing data to be corrupted or destroyed. I have burned-in and tested cold spares ready to replace a drive and, at worst, it would be 18 hours between the fault and the drive replacement.

moelassus · May 17, 2018

Chris Moore said:
I check my server health every day and I have email alerts setup, so I will be notified of a drive fault immediately and I replace drives at the first reallocated sector. I am absolutely paranoid about any fault causing data to be corrupted or destroyed. I have burned-in and tested cold spares ready to replace a drive and, at worst, it would be 18 hours between the fault and the drive replacement.

Is there any good guidance out there about setting up a good alerting strategy? Does FreeNAS just do that out of the box if I've configured email alerts? Specifically what did you have to do, if anything, to receive alerts on reallocated sectors?

rvassar · May 17, 2018

Chris Moore said:
I recognize that you have a background in this, but I do also.

(snip)

I think the odds of having three pools fail simultaneously are astronomical.

I'm a Geologist by training, from California. I survived the Loma Prieta Earthquake in 1989, and my father returned home from work after EPO'ing his Tandem Mainframe. It stayed on, but was sitting on it's side. Seven years later I was working at Sun, downstairs from the future ZFS crew. So... I probably have some scars/PTSD clouding my judgement, but... I'll only note that having all three pools in the same building is in and of itself a risk, and that's what your off-site backup is for.

Chris Moore said:
Not sure what your point is here, but you can't utilize the network speed of 10Gb Ethernet only having 4 dives. The drives will not be fast enough to feed the network connection.

There's not much in between 1GbE & 10 GbE, and spinning rust is not the only game in town. The intermediate step of quad-1GbE is sitting here on my desk, I finally found the box the cards were hiding in, but I'm not sure it's worth the cabling & configuration hassle when 10GbE cards are so cheap. Yes, I'm out of SATA ports, but I have a couple free PCIe slots...

Chris Moore said:
I recognize that hot spares can be the thing that saves a system, however, if a system is monitored it should never get to that point.

(snip)

at worst, it would be 18 hours between the fault and the drive replacement.

And that defines your risk window. What are the statistical odds of two drives failing in 18 hours? Straight drive wear, little risk, but not anything like zero. Catastrophic failure ala an Earthquake, probably very little risk in your locale, and a hot spare wouldn't save you. But drive wear plus unplanned environmental factors is not something to be ignored. Worn & failed drives get hot, and tend to cook adjacent drives, etc... Air Conditioning can fail. All manner of little cumulative factors come into play, some you can address, some you can't.

At Sun, sitting on the mudflats between the Bay, the San Andreas & Hayward faults, we used hot spares, staffed monitoring, and off-site backups.

Chris Moore · May 17, 2018

rvassar said:
having all three pools in the same building is in and of itself a risk, and that's what your off-site backup is for.

For my purposes, my personal home use, I am not willing to spend the money for an offsite backup and my location is not likely to suffer a disaster that would destroy my systems. Even though I was without power for a month and had to have a new roof after Hurricane Katrina, I was back in operation as soon as power was restored. In my home, I am not paying the bill for a backup generator any more than I am paying a monthly charge to back my data up offsite. I am not made of money / it isn't worth that much to me. There would need to be a business case for it and my personal data doesn't make me any money. It is more like spending money on a boat. The more I spend on it, the more I spend on it...
Where I work, on the other hand, we have a generator plant that can keep the whole facility up (and did) right through Hurricane Katrina and we maintain our own backup facility in a bunker. It is all down to what the individual, or organization needs.

rvassar said:
The intermediate step of quad-1GbE is sitting here on my desk, I finally found the box the cards were hiding in, but I'm not sure it's worth the cabling & configuration hassle when 10GbE cards are so cheap.

I went there, and no, it isn't worth the hassle.

rvassar said:
Yes, I'm out of SATA ports, but I have a couple free PCIe slots...

A SAS card like this is the easy answer to add a few drives:
https://www.ebay.com/itm/HP-H220-6G...0-IT-Mode-for-ZFS-FreeNAS-unRAID/162862201664
I use one like that with one of the SFF-8087 connectors going to each of the two 24 port SAS Expander backplanes in the chassis.
You can direct attach 8 drives with a pair of cables like this:
https://www.ebay.com/itm/Lot-of-2-M...-Forward-Breakout-Internal-Cable/371681252206
As for 10Gb networking, these Chelsio cards should work very nicely with FreeNAS:
https://www.ebay.com/itm/110-1047-2...t-PCI-E-Ethernet-Network-Adapter/263638639713

rvassar said:
Worn & failed drives get hot, and tend to cook adjacent drives,

That is what took three of the drives in the second vdev of that 80 drive system. The system had last logged the drive temperature at over 100 C and the two drives adjacent to it failed after that. I agree with the idea of hot-spares, when the circumstances call for it and I will likely build hot spares into the pool of the new 60 drive server that we are getting at work later this year. It isn't that I don't see the value of hot spares, I just don't see the value of them in my home system.

rvassar said:
What are the statistical odds of two drives failing in 18 hours?

I have had two drives start accumulating bad sectors within minutes of each other (two in one day) but most of the drive failures that cause me to replace drives, both at home and at work, are bad sectors and not catastrophic failures. I have had catastrophic failures, but it is not as common as it once was.

rvassar said:
Air Conditioning can fail.

We have a problem in my server room right now where one of the two cooling systems is not strong enough to keep it cool. We found out the hard way when the newer (stronger) system had a failure and the other system couldn't do the job. I am going to push management to get an additional cooling system. It is hard to get them to see the need for and approve spending for something that is only needed in the case that something else fails.

Chris Moore · May 17, 2018

moelassus said:
Is there any good guidance out there about setting up a good alerting strategy? Does FreeNAS just do that out of the box if I've configured email alerts? Specifically what did you have to do, if anything, to receive alerts on reallocated sectors?

Take a look at these scripts:

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

There are several of these that I use to keep tabs on my systems in addition to the alerting features that are built into FreeNAS.

rvassar · May 17, 2018

Chris Moore said:
A SAS card like this is the easy answer to add a few drives

I need to figure out the space vs IOPS question. I have a soft spot for PCIe attached flash, but you kind of have to buy that new, and its expensive! I could pick up a HGST FlashMax II on eBay, and it would probably be the only thing I'd consider used (it's nearly 50% over-subscribed for wear levelling), but there's no BSD driver.

Chris Moore said:
It is hard to get them to see the need for and approve spending for something that is only needed in the case that something else fails.

In QA we often work in generic labs that get retooled every 12 - 18 months. So as a project comes to fruition, and the QA work ramps up, hardware gets thrown in haphazard, the problems mount, and A/C imbalances start to cause problems... Rental spot chillers are brought in, etc... Then a condensate tank fills over a weekend, and a rack cooks itself... Management finally calls facilities in to rebalance the A/C... Lather... Rinse... Repeat.

When the spot chillers show up. Have an alibi! :D

Important Announcement for the TrueNAS Community.

Supermicro FreeNAS

Motheboard

SUPERMICRO MBD-X11SSH-LN4F-O Micro ATX Server Motherboard LGA 1151 Intel C236

SUPERMICRO MBD-X11SSM-F-O Micro ATX Server Motherboard LGA 1151 Intel C236

Patron

Hall of Famer

Patron

Wizard

Hall of Famer

Guru

MVP

Guru

Hall of Famer

Hall of Famer

Guru

Guru

Hall of Famer

Guru

Hall of Famer

Dabbler

Guru

Hall of Famer

Hall of Famer

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Supermicro FreeNAS"

Similar threads