Mirror of stripe of Raidz3

macxs

Dabbler
Joined
Nov 7, 2013
Messages
21
Hi,

I have 88 discs and I want to configure them as follows:
11 discs (8+3) in raidz3 (8 raidz3)
4 of raidz3 in a stripe for big size (2 stripes)
Until here it's possible and looks like

tank ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
gptid/fec5265c-2f9d-11ed-9b1f-002590ca45ba ONLINE 0 0 0
gptid/ff283a78-2f9d-11ed-9b1f-002590ca45ba ONLINE 0 0 0
gptid/ffad2450-2f9d-11ed-9b1f-002590ca45ba ONLINE 0 0 0
....
raidz3-1 ONLINE 0 0 0
gptid/ffd080cf-2f9d-11ed-9b1f-002590ca45ba ONLINE 0 0 0
gptid/ffcdd56f-2f9d-11ed-9b1f-002590ca45ba ONLINE 0 0 0
....
raidz3-2
....

Now my question: is it possible to mirror those two stripes? In some other posts on this board I read that nesting pools is a very bad idea and this is my opinion, too.
So a) adding some pools is not an option and b) the zpool create command must handle a three-tier-combination-creation which I could not achieve nor google this ;-).
Another possible way would be to mirror seperate discs and combine them to raidz3 and stripe them.
But when I want to
zpool create tank raidz3 mirror disc1 disc2 mirror d3 d4 mirror d5 d6 mirror d7 d8 ... raidz3 mirror d23 d24...
it tells me that raidz3 requires at least 4 devices.

Is it possible in a supported way?

It's not so important but for your understanding why I want it that way: these 88 HDDs are located in two different JBODs which are connected over ATTO bridges and FC to two servers in two seperate locations. I know replication and already use it with some other storages but in this case mirroring would be of much more benefit because both servers see all HDDs and if one server fails or must be updated, the other server can import the pool and continue serving shares.


Thank you!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
is it possible to mirror those two stripes?
No. When a pool contains multiple vdevs, data is striped across all vdevs.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Perhaps what you're really asking is "can I replicate one pool to the other?"... yes you can.

You could make the frequency a pretty low number, so it would be almost as good as a mirror. (but make sure you leave enough time so that the replication can complete between snapshots/replications).

If both servers can see all disks, you would always mount the "primary" copy anyway, so any lag in replication wouldn't matter.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It's not so important but for your understanding why I want it that way: these 88 HDDs are located in two different JBODs which are connected over ATTO bridges and FC to two servers in two seperate locations. I know replication and already use it with some other storages but in this case mirroring would be of much more benefit because both servers see all HDDs and if one server fails or must be updated, the other server can import the pool and continue serving shares.

This sounds like an excellent use-case for TrueNAS SCALE and the clustered/highly-available SMB infrastructure; however, it's still early on in the release process.

If you're interested in making this work under CORE in a supported manner, this would require two separate servers, separate pools, and replication configured between them. Things like "automatic failover" and similar highly-available controller scenarios with a shared storage backend are supported under the Enterprise edition.

Replication also has some advantages in that it's able to protect you from things like accidental deletion or ransomware via ZFS snapshots - an "immediate mirror" scenario doesn't help this, because any changes would be made in both places at once.
 

macxs

Dabbler
Joined
Nov 7, 2013
Messages
21
Thank you for your answers.

Perhaps what you're really asking is "can I replicate one pool to the other?"... yes you can.
Yes, I considered replication, as I stated in my original post, but...

If both servers can see all disks, you would always mount the "primary" copy anyway, so any lag in replication wouldn't matter.
Yes, you can mount/import the primary copy but if I'm not wrong one cannot use the replication jobs because sender/receiver change. Also replication would be done via ethernet interfaces. Mirroring traffic would take it's way via FC. OK, we have 10GE interfaces but I would consider FC the more beautiful way :smile:

Replication also has some advantages in that it's able to protect you from things like accidental deletion or ransomware via ZFS snapshots - an "immediate mirror" scenario doesn't help this, because any changes would be made in both places at once.
Erm, if snapshots protect against ransomware then it doesn't matter if they are mirrored oder replicated :smile: The data in the snapshot rest the same. (Besides that there are known cases where ransomeware groups gained access to storage backend and deleted snapshots after encrypting data - but if that happens, only write-once-read-only off-site backups can preserve the data.)

This sounds like an excellent use-case for TrueNAS SCALE and the clustered/highly-available SMB infrastructure; however, it's still early on in the release process.
In fact the system was orinigally a nexenta store which promised HA. After I posted my question I remember how it was set up: each HDD in one JBOD was mirrored with another HDD in the remote JBOD. After that, half of the mirrored discs were put into a raidz3 (I guess) which were primarily hold by the first server and second half were hosted primarily by the second one to distribute the load. But I don't remember how exactly the pool was constructed.
Unfortunately Nexenta didn't manage to get HA working, especially the iSCSI shares prevented that (we will not use iSCSI in this system) and they had very poor support.


So my next question: can I put several mirrors in a raidz[1-3]? Otherwise I will have to stripe them which will not protect against the unlikely case of failing HDDs and problems with the other JBOD or it's connection. I tried with
zpool create tank raidz3 mirror d1 d2 mirror d3 d4 ....
and unfortunately this does not work.
 

macxs

Dabbler
Joined
Nov 7, 2013
Messages
21
If both servers can see all disks, you would always mount the "primary" copy anyway, so any lag in replication wouldn't matter.
...and in a case of failure I would have to make the copy writeable, then the two copies are out of sync and after the primary site is repaired I'd have to resync the full volume back to the primary site (maybe it's possible to link the snapshots and sync only the changes but I'm not that ZFS Pro).
So, a mirror is much less complicated in my opinion :smile:
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...

So my next question: can I put several mirrors in a raidz[1-3]? Otherwise I will have to stripe them which will not protect against the unlikely case of failing HDDs and problems with the other JBOD or it's connection. I tried with
zpool create tank raidz3 mirror d1 d2 mirror d3 d4 ....
and unfortunately this does not work.
No.

ZFS does not support nesting of RAID levels. Further, any attempt to mirror storage drives at a lower level and expose those LUNs to ZFS for use in a RAID-Zx is a very bad idea. Many attempts at using ZFS on top of hardware RAID have resulted in data loss.

ZFS is not the end all to storage. Their are lots of problems ZFS does not solve, including clustered storage.


That said, I routinely use Enterprise level SAN, (which has it's own internal RAID), on Solaris 11 with ZFS. But, in my case, I use the LUNs as stripes, not adding ZFS Mirroring or RAID-Zx on top.

The one time I saw a RAID-Z1 on SAN, (Solaris 10), performance sucked. And it sucked to the point of having to migrated the datasets to a new, clean, striped ZFS pool.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
In fact the system was orinigally a nexenta store which promised HA. After I posted my question I remember how it was set up: each HDD in one JBOD was mirrored with another HDD in the remote JBOD. After that, half of the mirrored discs were put into a raidz3 (I guess) which were primarily hold by the first server and second half were hosted primarily by the second one to distribute the load. But I don't remember how exactly the pool was constructed.
Unfortunately Nexenta didn't manage to get HA working, especially the iSCSI shares prevented that (we will not use iSCSI in this system) and they had very poor support.

The only time I've seen a "nested vdev" in ZFS in a non-debug scenario is when you're doing a drive replacement:

Code:
pool
    raidz2
        drive1
        drive2
        mirror-replacing
            drive3
            newdrive3
        drive4
        drive5
        drive6


In a world with perfectly spherical cows, ZFS could potentially handle each "drive" in a RAIDZ3 being composed of those "virtual mirrors" - but scaling this to 44 "virtual mirror" devices spanning two geographic locations and relying on this for 24x7 production is well into Here Be Dragons territory to set up and support, in my opinion.

Provided that your RTO is satisfied by manual failover and activating the shares on the "SiteB" server, and your RPO is high enough that snapshots can complete between it, replication is a much simpler means to an end.

For a true HA scenario here, it is possible to set up a ZFS system to handle this; but the amount of modification to the "appliance" configuration of TrueNAS would likely cause unexpected results, and of course break on update.

If each of the SiteA and SiteB consume their own local JBODs, replication is configured for data redundancy, and the configurations are synced between them, you could use pool import/export to handle manual failover/failback, with expected performance loss when you have SiteB mounting SiteA's JBOD and vice versa. (How fast is the FC link between the "SiteA" server, and the "SiteB" JBOD?)
 

macxs

Dabbler
Joined
Nov 7, 2013
Messages
21
Thank you all for your input!

I now remember the configuration on the Nexenta System: there were two stripes of each 11 4 way mirrors. This makes sense in conjunction with ZFS functionality you explained.

So I will either create a stripe of mirrors or replicate stripes of raidz3. I will think about it on weekend and I will cheer my beer for each of you :smile:

Again, thanks and have a nice weekend!

Bye Marco
 
Top