Expanding from (RAID-Z2) to (Stripe of RAID-Z2)

Status
Not open for further replies.

ub3r

Dabbler
Joined
Dec 8, 2012
Messages
15
Greetings,

Current Drives: 10 x 3 TB
Proposed Drives: 20 x 3 TB
Current Utilization: About 15 TB

Original Hardware Config: LSI 9211-8I HBA, 8x drives plugged into it via two SFF-8087 breakout cables, then the remaining 2 drives plugged directly into the motherboard.
New Hardware Config: Purchased a second LSI 9211-8I, plus two HP 24 port SAS Expanders. 10 Drives are plugged into each SAS Expander via SFF-8087, then each HBA is dual linked (two 8087-8087's) into it's own dedicated LSI 9211-8I. I booted everything and all drives are detected without any configuration on my end, which is awesome.

Kicker: I have 3 x 6 TB drives striped together, I can easily move all my data off my existing 15 TB /share and onto the stripe temporarily (yes, its worth the risk to me) if I need to completely empty my /share to reconfigure it for a stripe of Z2's. However, if I can avoid doing this (for time/risk's sake) I would rather avoid it obviously. :)

Question: I purchased 10 more 3 TB drives (20 now total) and wish to create an additional RAID-Z2 configuration, and stripe it with my first one, thus doubling my capacity and increasing performance some. What is the correct procedure to do this? I've been reading ZFS guides and it seems pretty straight-forward from the CLI, however I'm heeding the warnings of not doing stuff in FreeNAS on the CLI behind the WebGUI's back if you can accomplish the task within the GUI.

If anyone could point me to the right article (google-fu has failed me) I would greatly appreciate it.

Thanks and have a wonderful day!
 

ub3r

Dabbler
Joined
Dec 8, 2012
Messages
15
Thanks for the reply Allan Jude!

Won't I need to do something to get the existing data to be striped though? Or, will it automagically go through and stripe existing data?
 

RueGorE

Dabbler
Joined
Dec 10, 2018
Messages
18
Expanding your pool will always result in a stripe of your current vdev and the new vdev consisting of your new disks. The data will therefore become striped across all disks in both vdevs.
 

ub3r

Dabbler
Joined
Dec 8, 2012
Messages
15
It doesn't look like existing data becomes striped though. I think I might need to copy/paste everything or zfs send/receive to get it to rebalance.
 

RueGorE

Dabbler
Joined
Dec 10, 2018
Messages
18

ub3r

Dabbler
Joined
Dec 8, 2012
Messages
15
Edit for clarity:

[Network]
Quanta 10Gbps SFP+ Switch
Optical Transceivers: Finistar FTLX8571D3BCL 10Gbps Multimode
Phyiscal Media: 5m (16ft) LC UPC to LC UPC Duplex 2.0mm PVC (OFNR) OM3 Multimode Fiber Optic Patch Cables

[Windows 10]
CPU: 5.2GHz i7
RAM: 32 GiB Non-ECC
Test HDD: Samsung 970 Evo NVMe drive, extremely fast
NIC: LACP'd Intel x520-DA2 10Gb (20Gb LACP)

[FreeNAS 11.2]
CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (32 cores)
RAM: 128 GiB ECC
Pool: 10x3TB RAID Z2 Striped with another 10x3TB RAID Z2
NIC: LACP'd Intel x520-DA2 10Gb (20Gb LACP)

[File A]
Location: Old file that was on original 10x3TB RAID Z2 prior to stripe

[File B]
Location: Written back to FreeNAS after the Stripe was created of two RAID-Z2's.


File Copy Test 1
Source: FreeNAS
Destination: Windows 10
File: File A
Bottleneck: FreeNAS Read. Windows 10 Write on NVMe drive is smoking fast.
Speed: 200-300MB/s
Notes: FreeNAS Read speed is consistent with testing prior to the pool expansion with the additional vdev that gave us striping.

File Copy Test 2
Source: Windows 10
Destination: FreeNAS
File: File A renamed to File B (allows us to copy it back to FreeNAS without overwriting the original, so we can test both)
Bottleneck: FreeNAS Write. Windows 10 Read on NVME drive is again, smoking fast.
Speed: 600-700MB/s
Notes: New files written to the array are correctly striping across both RAID-Z2 vdevs.

File Copy Test 3
Source: FreeNAS
Destination: Windows 10
File: File B (its technically the same file as A, however it was written to the array AFTER we expanded it with the additional vdev and created striping)
Bottleneck: FreeNAS Read. Windows 10 Write on NVMe drive is, you guessed it, smoking fast.
Speed: 500-600MB/s
Notes: I'm now reading twice as fast as I was previously with a file that was created AFTER the striping was available.

It really looks like after you expand, it doesn't go back and balance your pool. It appears you need to do that manually after. Are there any guides to do this correctly? I believe its a zfs send/receive and potentially renaming stuff, but I'd like to get other opinions too.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It really looks like after you expand, it doesn't go back and balance your pool. It appears you need to do that manually after. Are there any guides to do this correctly?
You're absolutely right, existing data is not automatically spread across the new stripe... there's not even a guarantee that all new data won't be written to only the new VDEV either until the system determines that they are about as full as each other (although you did observe that it happens, I have seen reports where people see things get 50% slower after adding a third VDEV, with all new writes seeming to go to the new empty VDEV... perhaps some older versions involved there and things got better, but I haven't seen the document clarifying it).

The best option is to move all of the data off (zfs send/recv is fine for that... you can find plenty of references how to use that out there, I won't re-write one here) and put it back. This does require having another pool in the same or another server with sufficient available capacity.

Be aware that even though you are data-heavy on the original VDEV, if the new VDEV breaks, you lose the ability to access even your old data on the original VDEV.

The best option is to clean the pool and copy the data back to it so at least you're sure to benefit from the lack of fragmentation and having as much of your data striped as possible (since the down-side of the risk is there, you should at least get the up-side of faster IO).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The best option is to move all of the data off (zfs send/recv is fine for that... you can find plenty of references how to use that out there, I won't re-write one here) and put it back.
If you are adamant that the existing data must be striped across the two vdevs, yes, this is the best (and really only) option. It's highly questionable, though, whether this is a valid objective.
 

Evi Vanoost

Explorer
Joined
Aug 4, 2016
Messages
91
No, you can't rebalance your data. You could read/write everything (eg. tarball each directory, delete the originals, unpack everything again) but your VDEV's will never be fully balanced. You could stream things off, destroy the pool and recreate it but I'm not sure that's worth the risk.

The question is why would you want to balance them out? If it's a performance issue, more spindles and SSD caches will help a lot more. Over time, if your pool is used randomly enough and things that age out are deleted, it will get more balanced by itself.
 
Status
Not open for further replies.
Top