create zpool from multiple mirrors

Status
Not open for further replies.

zimmerru

Dabbler
Joined
Mar 21, 2013
Messages
17
I am working on setting up a server in our lab at work with FreeNAS and seem to be stumbling upon how to do what I want to do. I have looked over the manual and checked out noobsauce80's slide show here:
http://forums.freenas.org/showthread...2ARC-for-noobs!

So hard drives go into zdev's - Got it
zdev's go into zpool's - Got it

Here is my setup:
28 146GB 10k RPM Drives in Dell PowerVault 220S enclosures
PowerEdge 2850 running FreeNAS 8.3.1 RC1 on a 4GB Flash drive

The FreeNAS will be storage for a separate VMWare ESXi host, and for performance I want to create 14 mirrors, and create 2 zpool's from those, with 7 mirrors each. With the GUI, creating a volume seems to essentially be creating zpool's so I don't see how I can create the 14 mirrored arrays, and then form the 2 large zpool's from them.

Perhaps I need to do it via the shell? am I missing something?
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
If I understand what you want to do, you can absolutely do this from the gui.

So you want 2 pools, each pool consisting of 7 vdevs, each vdev consisting of a mirror of 2 drives?

The way I look at it (and I may not be correct, but it works for me) is that the volume manager in the gui creates vdev's. If you type in a new volume name, it'll create a new zpool and assign your vdev to that pool. If you select a zpool from the 'volume to extend', then it adds (stripes) your vdev into an already existing zpool.

Go to volume manager, type in first zpool name, lets say "pool1". Select drives A and B. Select mirror. Add volume. pool1 consists of 1 mirrored vdev.

Go back to volume manager, select extend pool1, select drives C and D. Select mirror, add volume. This stripes in another mirror vdev into the existing pool. You now have a pool with 2 mirrored vdev's.

Repeat for the other vdev's.

Create the second pool the same way. Type a new name, for example, pool2, select two drives, and mirror. Then extend this pool with additional mirrored vdev's.

This can also be done from the command line as well. I'm unsure as to the pro's and con's of either method.

Also be aware that backup's are going to be crucial on both of these pools. If a drive fails, then any read error or bad data returned by the other drive in the vdev is going to be uncorrectable. IE, corrupt data. As with both drives failing in the same vdev, you loose the whole pool. Look at it another way, if you have a bad drive, you're putting your faith in 1 drive operating in a perfect manner to protect the data of 12 other drives. Good backups and regular scrubs will help I imagine.

I hope all that I've said is correct. I am still fairly new to zfs and freenas, so please feel free to shoot me down if need be.
 

zimmerru

Dabbler
Joined
Mar 21, 2013
Messages
17
If I understand what you want to do, you can absolutely do this from the gui.

So you want 2 pools, each pool consisting of 7 vdevs, each vdev consisting of a mirror of 2 drives?

The way I look at it (and I may not be correct, but it works for me) is that the volume manager in the gui creates vdev's. If you type in a new volume name, it'll create a new zpool and assign your vdev to that pool. If you select a zpool from the 'volume to extend', then it adds (stripes) your vdev into an already existing zpool.

Go to volume manager, type in first zpool name, lets say "pool1". Select drives A and B. Select mirror. Add volume. pool1 consists of 1 mirrored vdev.

Go back to volume manager, select extend pool1, select drives C and D. Select mirror, add volume. This stripes in another mirror vdev into the existing pool. You now have a pool with 2 mirrored vdev's.

Repeat for the other vdev's.

Create the second pool the same way. Type a new name, for example, pool2, select two drives, and mirror. Then extend this pool with additional mirrored vdev's.

This can also be done from the command line as well. I'm unsure as to the pro's and con's of either method.

Also be aware that backup's are going to be crucial on both of these pools. If a drive fails, then any read error or bad data returned by the other drive in the vdev is going to be uncorrectable. IE, corrupt data. As with both drives failing in the same vdev, you loose the whole pool. Look at it another way, if you have a bad drive, you're putting your faith in 1 drive operating in a perfect manner to protect the data of 12 other drives. Good backups and regular scrubs will help I imagine.

I hope all that I've said is correct. I am still fairly new to zfs and freenas, so please feel free to shoot me down if need be.

*lightbulb* Ahh that totally makes sense! and makes me question the design/layout at the same time. I'm trying to get the most space for what hardware I have but I don't want to take too much of a risk. I do have 4 spare drives in the 2850 itself I was planning on setting 2 as spares per zpool but now i'm not quite sure how to do that either without dedicating them as spares for a specific zdev. Would having spares like that reduce the risk to the pool or no? Can a drive be set as a spare for any zdev in a pool?
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I'm pretty sure spares are associated with the zpool, and not individual vdev's. However I'm not too familiar with how zfs uses spares. I'm not sure if they're automatically used to resilver a pool in the event of a failed drive.

Either way, spares for your example zpool setup aren't really going to mitigate the risk. If a drive fails, and a spare is automatically taken to replace it, you're still relying on the 'other' drive in the mirror to operate flawlessly during resilver.

Maybe someone who knows more on how spares are implemented can chime in.

Also, you mention the storage is for esxi. How will it be presented? iscsi? I haven't done much with iscsi, but I understand there's some caveats using iscsi on top of zfs.

For data resiliency, and space efficiency you might use one of the raidz options instead. With 14 drives per pool, you could use 2 vdevs of 7 drives in raidz2. Not perfectly optimal because 128/(7-2) is not a whole number, but I'm not sure how much this really matters. This would give you the capacity of 10 drives in each pool. If a drive fails in one of the 7 disk vdevs, you still have one drive left to correct any read errors that happen between when the drive fails, and when the resilver completes. If 2 out of the 7 drives fail, you're once again relying on the remaining 5 to operate perfectly. But with a single drive failure, zfs can still self heal any bad reads. Note, this is per vdev. You can have 2 bad drives in the entire zpool, and as long as they're in different vdev's, you still have protection against read errors.

I have a zpool that's for storing movies / tv shows, etc. It's 10 drives, all 3tb regular sata. They're in a single raidz3 vdev. Also not optimal as 128/(10-3) is not a whole number. I don't do a lot of random IO on the zpool, so I'm probably unlikely to see any detriment of that. For sequential access, it smokes pretty good. It scrubs at a little over 1 gigabyte per second. Since I have 3 drive redundancy, I can loose any two drives, and still have zfs correct any remaining drive that decides to return bad data. I can loose any 3 drives, but then I'm relying on the remaining 7 drives to operate perfectly until the pool is healed.
 

zimmerru

Dabbler
Joined
Mar 21, 2013
Messages
17
I'm pretty sure spares are associated with the zpool, and not individual vdev's. However I'm not too familiar with how zfs uses spares. I'm not sure if they're automatically used to resilver a pool in the event of a failed drive.

Either way, spares for your example zpool setup aren't really going to mitigate the risk. If a drive fails, and a spare is automatically taken to replace it, you're still relying on the 'other' drive in the mirror to operate flawlessly during resilver.

Maybe someone who knows more on how spares are implemented can chime in.

Also, you mention the storage is for esxi. How will it be presented? iscsi? I haven't done much with iscsi, but I understand there's some caveats using iscsi on top of zfs.

For data resiliency, and space efficiency you might use one of the raidz options instead. With 14 drives per pool, you could use 2 vdevs of 7 drives in raidz2. Not perfectly optimal because 128/(7-2) is not a whole number, but I'm not sure how much this really matters. This would give you the capacity of 10 drives in each pool. If a drive fails in one of the 7 disk vdevs, you still have one drive left to correct any read errors that happen between when the drive fails, and when the resilver completes. If 2 out of the 7 drives fail, you're once again relying on the remaining 5 to operate perfectly. But with a single drive failure, zfs can still self heal any bad reads. Note, this is per vdev. You can have 2 bad drives in the entire zpool, and as long as they're in different vdev's, you still have protection against read errors.

I have a zpool that's for storing movies / tv shows, etc. It's 10 drives, all 3tb regular sata. They're in a single raidz3 vdev. Also not optimal as 128/(10-3) is not a whole number. I don't do a lot of random IO on the zpool, so I'm probably unlikely to see any detriment of that. For sequential access, it smokes pretty good. It scrubs at a little over 1 gigabyte per second. Since I have 3 drive redundancy, I can loose any two drives, and still have zfs correct any remaining drive that decides to return bad data. I can loose any 3 drives, but then I'm relying on the remaining 7 drives to operate perfectly until the pool is healed.

I was thinking of iSCSI yes. I went with mirrors over RaidZ because of read and write performance which has a huge impact on the loading and running of the VM's. I am open to any option really, but I have read several posts aobut not using RaidZ for ESXi VM Storage due to performance issues. Albeit the host only has 32GB of ram so at most would be running 15-30 VM's, not sure if that lessens the need for performance though.

As for mirrors and corrupt data, it has always been my experience that if one drive in the mirror has corrupt data on it, they both do. The mirroring simply protects from drive failures and not so much from data corruption, which is where other types of raid stand out. This is of course from a time before ZFS, so perhaps with ZFS that isn't the case anymore.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
ZFS protects you from data corruption too. As I understand it, and like I said, I have a basic understanding only, zfs checksums any written data. When you read data back, it checksums the read data, and compares checksums. If they match, everything is good. If they don't match, then one of the drives has returned 'bad' data. Also heard it referred to as silent data corruption. Because the drive returned data, and didn't give the OS any indication that the data was bad. Zfs notices this, and if it has redundancy that it can use, it rebuilds that particular block of data. Checksums it again to verify it got it correct. Then it returns the good data to you. As I understand it, anytime zfs fixes data like this, it increments the "CKSUM" column in the output of "zpool status".

You can try it your self. Create any zpool, with redundancy. Write data to it. Then 'corrupt' part of the data on one of the devices (dd to random locations on the disk for example). Read the data back, and it should be returned correctly. Check "zpool status", and you should see CKSUM errors on the device you intentionally corrupted. A subsequent "zpool scrub" will completely fix any remaining data corruption, as zfs goes through and verifies everything is good.

As a real world example, twice in the 6 months I've had my 10 drive raidz3 pool online, I've gotten checksum errors during scrubs. There were only 2 or 3, and on only one drive each time. There's nothing wrong with the drives, they just happened to happily return bad data during that scrub. Zfs detected it, fixed it with redundancy from another drive, and re-wrote the offending part of the disk.

I'd say it's pretty cool, but that's an understatement.
 
Status
Not open for further replies.
Top