Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs!

cyberjock · Sep 11, 2013

From what I've read about XFS it pretty much needs ECC RAM too. Remember, you are building a server, so many applications make assumptions and expectations that you need to uphold. Just google "ECC XFS" and you'll probably find this information for yourself.

Skram0 · Sep 15, 2013

Hey, thanks for the great write up. Priceless knowledge all in one place.
I'm thinking there's a typo on the page labeled "Performance of zpools...". In the RAIDZ3 section the calculation is described as 2n + 4. Shouldn't that be 2n + 3?

Thanks again for this golden nugget of information.

cyberjock · Sep 15, 2013

Yeah. I was looking yesterday and I saw that typo. It should be 2n + 3. I didn't go back to fix it, but I'm currently dealing with some drama. Rest assured when I go back to fix the presentation that is definitely on the list. :)

tmacka88 · Sep 23, 2013

Nice read. So just for some clarification as I am now a little concerned after the read. If I created multiple volumes:
Volume 1 = RAIDz1 with 3 X 2TB
Volume 2 = 1 X 1TB
Volume 3 = 1 X 1TB

Are all of these volumes in one zpool or does each volume have a zpool. From what you were stating, if I have a HDD failure in either Volume 2 or 3 I will loose all 3 volumes. Because I setup this system as a NOOB I though that I would only loose the data from the Volume 2 or 3 if a drive failed in each and not loose everything. Hence, why I used multiple volumes one of which with important data I used a RAIDz setup. I know I should use a RAIDz2 but I didn't.

So what I am asking is will Volume 1 still work and have redundancy (1 disk in that vol) if either vol 2 &/or 3 fail?

Thanks

liukuohao · Sep 23, 2013

tmacka88 said:
Nice read. So just for some clarification as I am now a little concerned after the read. If I created multiple volumes:
Volume 1 = RAIDz1 with 3 X 2TB
Volume 2 = 1 X 1TB
Volume 3 = 1 X 1TB

Are all of these volumes in one zpool or does each volume have a zpool. From what you were stating, if I have a HDD failure in either Volume 2 or 3 I will loose all 3 volumes. Because I setup this system as a NOOB I though that I would only loose the data from the Volume 2 or 3 if a drive failed in each and not loose everything. Hence, why I used multiple volumes one of which with important data I used a RAIDz setup. I know I should use a RAIDz2 but I didn't.

So what I am asking is will Volume 1 still work and have redundancy (1 disk in that vol) if either vol 2 &/or 3 fail?

Thanks

As you read the slide a couple of times(I forgotten which page),
you will get the idea that a vdev = virtual device = in ZFS language - it can be usb thumb drive/ hard disk which made
of 1TB hard disk in Volume 2 or 3 (referring to your case shown above), when added into a storage pool/Zpool
(just a bunch of hard disks for data storage) and if anyone one bite the dust/hardware failure (don't care which hard disk
from which volume 1/2/3), there is no way to recover the data, even though Volume 1 is under RAIDz1.

Unless you don't want Volume 2 and 3 be under ZFS file system and have it under software Raid 1 (mirrorring)
Or have it Rsync between the 2 units of 1TB.

I hope my advice is correct? If not, maybe Cyberjock can correct me ;)

tmacka88 · Sep 23, 2013

liukuohao said:
I hope my advice is correct? If not, maybe Cyberjock can correct me ;)

Thanks liukuohao,

This is what I thought, but my common sense thought it was wrong. I would have thought that each volume would protect against other volumes. Is there a reason as to why this is not used like this in freenas?

cheers

time to fix my storage setup i think

liukuohao · Sep 23, 2013

tmacka88 said:
Thanks liukuohao,

This is what I thought, but my common sense thought it was wrong. I would have thought that each volume would protect against other volumes. Is there a reason as to why this is not used like this in freenas?

cheers

time to fix my storage setup i think

I guess that is just the way ZFS file system works from day 1! At the beginning, I also thought that I will get full protection if it is under ZFS system.
I suppose if you are keen and learn more about OpenSolaris Bible, you get hold of the pdf file, download it, and read the section about ZFS.

Is there a reason as to why this is not used like this in freenas?

What do you mean? Can you elaborate more.

tmacka88 · Sep 23, 2013

I was just wondering why ZFS or FreeNAS does not have this redundancy between volumes. It would be great if each volume was independent to each other. that way as per my setup before if vol 2 &/or 3 failed I would not loose volume 1.

This would be ideal, but I am sure there is a reason as to why it's not done like this.

cheers

liukuohao · Sep 23, 2013

tmacka88 said:
I was just wondering why ZFS or FreeNAS does not have this redundancy between volumes. It would be great if each volume was independent to each other. that way as per my setup before if vol 2 &/or 3 failed I would not loose volume 1.

This would be ideal, but I am sure there is a reason as to why it's not done like this.

cheers

Yes, it will be ideal! But that is just the limitation of ZFS file system. I guess we all have to accept it.
Perhaps Cyberjock can explain the reason of not having this feature.

So if I can put it this way, if you want to have ZFS then you can only play around with RAIDZ1 or RAIDZ2 or Mirroring for data protection. In future, if when to grow/increase your Zpool, can you use Stripe to add in more vdevs. Because ZFS means data redundancy and protection and
but can accept a hybrid (somewhere in between)- that is single hard disk with no redundancy, provided the user knows the risk implication.

Err....I think everything will be more clearer to you, if you don't mind reading the OpenSolaris Bible (read the ZFS section)
the book explain how ZFS works and any noob can read and understand it.

The GUI of freeNAS today is in way control by the ZFS CLI. If ZFS command line interface does not allow you to have
this feature, then, you cannot have it in FreeNAS GUI.

cyberjock · Sep 24, 2013

That's not how ZFS is designed. No file system has what you are asking for. But, if you were to setup mirrors you could achieve the same objective.

You can also use ZFS snapshots and replication to achieve the same goal.

SmilingInSeattle · Sep 28, 2013

Here it is, September 10, about 7 weeks after I first read your "@cyberjock's" slideshow. While I didn't take your advice verbatim, I doubled the size of my NAS taking into account your reasoning.

To rehash, so one doesn't have to go back to the original posting, I'd bought an HP Proliant N40L, put 5 2TB disks in at as well as 8GB of RAM, applied the Boot firmware upgrade, and ran FreeNAS 8.2.0 on a ZFS raidz1 (one degree of freedom). The files on the NAS (half of which in the neighborhood of 700MB apiece are write once and read multiple times). This was a Home Office NAS which serves about 9 media devices (including computers with media software installed), of which, about three devices is the maximum at any one time. I purchased one 2TB spare which is still in its packaging for a quick repair if I should get indication of a failure. In addition, I've been backing up all my files offline twice to 500GB internal hard drives using a USB to SATA connection tethered to my Linux running Desktop. At the time I started, 500GB drives were at the "sweet spot" of $/MB. Right now 1TB drives are a better buy, buy considering the time it takes to read/write 500MB, I'll probably continue to buy 500MB drives for backups. Needless to say, I've also put together a database to keep track of my assets.

The 10TB of hard drives yielded about 7.8TB of storage. At the time I started looking at expansion, FreeNAS was reporting about 4.8TB in use. Using the 80% rule I saw that I should begin looking at expansion.

At the time I first built my NAS I did a lot of reading about the 2TB hole and Linux RAID, the issues with Linux licensing of ZFS, the fuse built ZFS module and comparing them with the pluses and minuses if ZFS. I eventually chose ZFS and FreeNAS and am still glad I did, even though, since then one can now compile a ZFS kernel mod for Linux (you just won't find it ---yet--- as a prebuilt binary because of the licensing issue. About the 2TB "hole," at the time of my reading, ZFS was touted as not having that "problem", but this was just at the time 3TB disks were being sold by Seagate and there was more interest in handling drives greater than 2TB in size: GPT and MFT issues, etc. What I did take note of was working with drives greater than 2TB is an issue really worth paying attention to. I was alerted by the manufacturer's failure rate of 10 to the 14th power. The technology of hard disk drives up to 2TB had been in existence long enough by this time for that number to have some real statistics behind it (a Universal Data Set we Statisticians call it).

Keeping all this in mind, I decided to add another vdev (raidz1-1) to my existing pool rather than rebuilding my RAIDZ with 4TB disks. Clearly, the best practice using drives with greater capacity than 2TB would be to create a RAIDZ1 (two degrees of freedom), better yet, a RAIDZ2 (three degrees of freedom); however, even with fiddling around in the Proliant box, the most drives I could put in it would be 5 (which I currently had), and, if I I were to rebuild to a RADIZ1 or RAIDz2, I'd have to come up with some drives I could spool the contents of my RAID off to while I put in the new RAID. All this copying had the potential to put me off line for at least a week (as I have to eat, sleep, and go to work), and I'd end up with some hardware that I'd have to figure out how to "recycle" sometime later.

So... taking the "best" from @cyberdoc's recommendations, I found a SanDigital 5bay tower raid being sold as an "open box." (It looked more like it was a last-on-the-shelf item than an "open box.") for about $150. I bought 5 more 2TB drives from the same manufacturer that built the original drives, so I had similar hardware throughout the RAID. I planned to plug this in the the one last available SATA port on my N40L (the eSATA port). I did NOT need to install the Rocket port multiplier that has bad reviews and came with the SanDigital tower as that was covered by the BIOS mod. I upgraded from the 8GB of Kingston ECC unbuffered RAM aboard to 16GB of the same, rebooted and found the 16GB of RAM was recognized, but I'd lost my original zpool. I found 6 disks were recognized, but I couldn't access the existing zfs pool. Panic!! I spent about 5 hours reading forums and trying a bunch of zfs commands, both on the GUI and via the command line. I imported the volume to find it couldn't be imported as the volume existed, so, after a forum suggestion, I detached the existing volume, rebooted, and imported it again, all to find myself back at "square one."

I was called to dinner (where I had a chance to get "out of the box"). After dinner, I shutdown the N40L, pulled the motherboard out (which, if you read Proliant forums, you learn is not thought of as an easy/fun thing to do, and reseated the SATA cable in the motherboard that goes to the backplane holding the 4-bay RAID array. I'd tried this earlier without pulling out the MB, but it must not have worked. Anyway, after I started up the machine and looked at the number of disks I had, there were 10. This time I imported the volume, then simply extended it using the GUI interface. Extremely easy.

I now have a 14.1TB ZFS with 9.2TB still available (or 14.2TB*0.8-4.9TB=6.5TB). Keeping with @cyberjock's original assertation, the next time I expand (if I'm still alive enough to be doing so,) I'll be looking at big U4 enclosures and whatever best practice with controllers, disk formats, file systems are available - even if it is for a home system. The big increase I saw was a much faster loadup of the file system directory from the remote machines accessing the server; however, and I attribute this to the fact that the new vdev is on a port multiplier, file transfers seem to be a little slower (but still more than adequate for my streaming reads).

I recently saw a 4-bay NAS using 4TB disks at one of the discount houses I buy from priced at about $3500. I've got only about $1700 invested in this setup. I probably use a little more electricity than the 4TB drive setup, but to get RAIDZ2 or RAIDz3, I'd have to buy 2 or 3 more 4TB drives and an enclosure to put them in --- and I'd still back up my files twice offline.

jyavenard · Oct 16, 2013

Thank you for a great presentation.

I've been using FreeBSD (from FreeBSD 3) up to FreeBSD 8. Built ZFS1 array and had great success with it. Now looking for a home solutions, and as I'm getting older, tinkering with the command line isn't my fancy any longer... Hence come FreeNAS.

I really like the FreeNAS mini hardware, but 4 disks only is a bit on the low side; and the pro 2U solutions are just outside my budget. So I'm starting to think I'll have to build it myself...

So come your presentation. Great work.

I've noticed some things that seem incorrect.
you state "heat is your hard drive worst nightmare" and provide a link to the Google study. But Google found that heat, unlike the popular belief was *NOT* factor: quote "Contrary to previously reported results, we found very little correlation between failure rates and either elevated temperature or activity levels"

A question in regards to your choice of motherboard: supermicro X9SCM-F-O. It only has 6 SATA ports; I read somewhere on this forum that your current RAID array was an 18 drives solution. What do you use for the extra sata ports?

What do you think of the low-power E3 processors? like the E3-1220L. Do you think those would provide acceptable performance still?

tmacka88 · Oct 16, 2013

have a look into an IBM m1015 server raid card. each card give u another 8 sata ports. all u have to do is cross flash the card to - IT. do a bit searching on the forum and u will find more info on this and cross flashing.

strandte · Nov 28, 2013

Thanks for the nice slide show, which has been a primer for me on FreeNAS. I'm always interested in understanding the reason why. On page 34 of your slideshow I think I found an error:
You write:
RAIDZ3 should have the total number of drives equal to 2n + 4. (ie 5, 7, 11, etc drives for the VDev)

I think 2n + 4 would be 6, 8, 12, etc

Could you explain the teorie behind these numbers, or link to a page on Wikipedia or similar that does. I will be Googeling this, but I think it could be a good addition to that page.

Thanks again!

gpsguy · Nov 28, 2013

It should be 2n + 3. cyberjock is aware of the typo and it's on his todo list.

strandte said:
RAIDZ3 should have the total number of drives equal to 2n + 4. (ie 5, 7, 11, etc drives for the VDev)

jyavenard · Nov 29, 2013

strandte said:
Could you explain the teorie behind these numbers, or link to a page on Wikipedia or similar that does. I will be Googeling this, but I think it could be a good addition to that page.

my understanding is as follow:

ZFS spreads across all the disks.
the maximum variable stripe size is 128kB / (number_of_drives - number_parity_drives)

Disks have either 512 bytes or 4kB sector size so the aim is to have such stripe size that will be a multiple of the disks' sector size.
For RAIDZ1 the minimum amount of disks is 3 (2 for data + 1 for parity), RAIDZ2 4 (2 data + 2 parity), RAIDZ3: 5 (2 + 3 parity)
Say we have 3 disks configured in RAIDZ1:
that gives: 128 / (3 -1 ) = 64kB stripe
That's a good number, because 64kB is a multiple of either 512 or 4096 bytes sector. You'll get the best performance out of that setup.
Now you have 4 disks in RAIDZ1:
128 / (4 - 1) = 42.6kB stripe.. That's no good, that will cause some stripe to be written across two sectors. That will cause additional delay as under some circumstances you'll need to read 2 sectors to read the data, when one read could have been enough, plus everything can be shifted making it even worse.
For RAIDZ6:
6 disks gives us 128 / (6 -2) = 32kB : good
7 disks: 128 / (7 -2) = 25.6kB: not good
8 disks 128 / (8 - 2) = 21.33kB; not good
9 disks 128 / (9 -2) = 18.28kB; not good
10 disk: 128 / (10 - 2) = 16kB; good
hence why you read that for raidz2 a good number of drives is 4, 6, 10
for raidz: 3, 5, 9 drives (though having only one parity disks for 8 disks is a bad idea, but speed-wise it's great)
raidz3: 5, 7, 11
and so on...
(note in the above, kB really is kiB, as 1kiB = 1024 bytes)
Hope that helps...

cyberjock · Nov 29, 2013

The math is sound, with one exception. The default stripe size in FreeNAS is 128KB, but it's actually variable "up-to" that size in powers of 2. The smallest is 4KB if I remember correctly(been awake 10 mins) unless you have established the 4k sector size. There is no way to force the size to any given size but simply "up-to", so every multiple must be "nice" mathematically or it gets messy quickly.

jyavenard · Nov 29, 2013

Hence the mention of "the maximum variable stripe size"

cyberjock · Dec 1, 2013

jyavenard said:
Hence the mention of "the maximum variable stripe size"

Yes, but you have to include the calculations for 64kb, 32kb, 16kb, etc. for any given disk configuration. That's all I was trying to say. I wasn't trying to say your math was wrong, just that you have to include every dividend of 2 down to 4kb(I think that's the smallest). So it gets complex quickly, and the name of the game is to try to be "good" with as many combinations as possible.

For example, 10 disks in a RAIDZ2 at 16kb is bad.. (16/8) = 2kb which appears normal, but is too small for a 4k sector size. And if your ashift is incorrect for your disks(or your partition isn't properly aligned, they should always be if you always use the FreeNAS GUI) that makes it messier. (This example is precisely why very wide vdevs can be very bad.) Unfortunately, even with the 128k stripe size(variable) you can pretty much expect that some writes will be every multiple up to the full 128kb size.

What I'd really like to see is a minimum stripe size option. There would be "lost" disk space, but performance wouldn't unnecessarily suffer.

The whole issue with the stripe sizes is one reason of quite a few why when people that want to do iscsi or NFS shares with ESXi and need very high I/O always get told to go with multiple mirrored vdevs. You get stripes that always align and you get multiple vdevs which helps ZFS with optimizing its volume managing potential.

The only reason I responded is because you never used any examples except the 128kb, which doesn't tell the whole story by a long shot but makes it appear you don't understand fully exactly what the "variable" actually does in the deepest layers. I was just trying to explain that its more than just the examples you provided.

nooby99 · Dec 23, 2013

cyberjock said:
Slideshow explaining VDev, zpool, ZIL and L2ARC and other newbie mistakes!

I've put together a Powerpoint presentation(and PDF) that gives some useful info for newbies to FreeNAS. I decided to create this slideshow because in the last 5 months I've been on this forum I've seen a lot of people confused about vdevs, zpools, zils, l2arcs, etc. Hopefully we can put to rest alot of the confusion once and for all.

We get a large amount of duplicate threads with the same questions being asked every other day. Personally, we get them so often I decided to stop answering them and decided a better use of my time would be to create this presentation. I literally read every thread and every post that goes on the forum. So if I don't answer either the answer is in the thread or the answer is found in this presentation or FreeNAS manual. Answering every 3rd thread with "Consult the manual" gets a little old after a while and I have better uses for my time.

This presentation also contains a lot of information that is explained in a little more detail for new users. It includes many common errors newbies make and can save you some heartache. If you are brand new to FreeBSD, I recommend reading the manual cover to cover. There are a lot of recommendations throughout the manual, and they are typically there because they are an error trap for many people.

I've saved this in Powerpoint because I have some animations in the slideshow. I'm not sure what other formats would work. If you would like this in another format that supports animations please let me know and I'll see what I can do. Currently I provide this in a powerpoint presentation and PDF. The PDF has no animations therefore the Powerpoint is preferred.

I'll try to keep it updated as necessary. If this presentation helps you please "like" it so others can see that it really is worth the time to read!

Updated as of: August 2, 2013 FreeNAS 9.1.0

Powerpoint: https://dl.dropboxusercontent.com/u/57989017/FreeNAS Guide 9.1.0.pptx.zip (This is the preferred format because it includes animations)

PDF via Google Docs: https://docs.google.com/file/d/0BzHapVfrocfwQkF0eU8wRnloU2M/edit
PDF: https://dl.dropboxusercontent.com/u/57989017/FreeNAS Guide 9.1.0.pdf.zip

Important Announcement for the TrueNAS Community.

Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs!

Inactive Account

Cadet

Inactive Account

Patron

Dabbler

Patron

Dabbler

Patron

Dabbler

Inactive Account

Dabbler

Patron

Patron

Cadet

Active Member

Patron

Inactive Account

Patron

Inactive Account

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs!"

Similar threads