doesnotcompute
Dabbler
- Joined
- Jul 28, 2014
- Messages
- 18
Hi All,
I could use some help understanding what's going on. I am not sure if:
1) the metrics (Reporting > Disk) has a gui bug that is not showing disk reads
2) the common knowledge of "striping = less cpu than raidZ2" is not so true
2) the common knowledge of "striping = a lot more performance than raidZ2" is not so true
So, the kit:
* FreeNAS-9.2.1.7-RELEASE-x64
* head unit: Dell R710 (2u rack)
* cpu: dual socket each with a 4 core L5520 @ 2.27GHz
* 48GB ECC
* perc was removed
* m105 reflashed in IT mode to act as 9200i for A/B sas 8087 to dell backplane (bios kept so can serve as boot hba)
* 8x 2.5" drive chassis
* 1x 2.5" 160GB intel s3500 for OS/boot
* 1x 2.5" 160GB intel s3500 for SLOG (not always used, see below)
* 6x 2.5" 500GB crucial M4 for l2arc (not always used, see below)
* onboard quad gigE broadcom
* drac ent.
* dual powersupply
* Chelsio dual port 10Gbps SFP+ SR nic
* LSI 9200e with dual 8088 out the back, reflashed to IT mode, no boot bios
* dual sas cables to a 4U supermicro chassis, the 45 drive one (24 in the front and 21 bays in the back) SAS2 internals, dual powersupplies
* 25x 3TB Hitachi SAS drives
My goal is to make an NFS filer that can receive 10Gbps of backups target throughput. The Chelsio may be connected in active/passive to 2 different 10G switches (no lag/lacp possible as they are not in a stack). I'm still working on the vlanning but got the new supermicro jbod racked this weekend and wanted to play around with vols and configs. Here is what I found:
A) the cpu load on reads or writes in a 25 disk stripe (one huge raid0, @ 66.88TB) vs. four x 6 drive raidZ2 (ie: 4 data drives + 2 parity drives x 4 of these = 24 drives @ 42.8 TB) was negligible
B) the performance on reads or writes in a 25 disk stripe (one huge raid0, @ 66.88TB) vs. four x 6 drive raidZ2 (ie: 4 data drives + 2 parity drives x 4 of these = 24 drives @ 42.8 TB) was surprisingly close
So, not sure if the size of the "who would build a stripe that big" stripe made it not realistic, or what. And I'm thinking the first item, that the gui must not be reporting reads, as I did dd with a 5TB file size at the end and got no disk hits doing reads (on this or any test)only writes, and since 5TB can't fit in a 45GB arc, it must be a gui bug...
anyways, here are some results, between each set of tests i destroyed the volumes and recreated with the drive configs noted below and all used a sparse 10TB folder rebuilt each time:
First Set of Tests - no SSDs used, other than the boot SSD, all 25 Hitachi's in a big stripe:
* no extra slog
* no l2arc ssds
* 25x 3TB Hitachi
* 25 drive stripe
***write speed:***
[root@freenas] /mnt/burnin-r0/testset# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 223.346941 secs (2403753146 bytes/sec)
[root@freenas] /mnt/burnin-r0/testset#
2,403,753,146 / 1024 (KB) / 1024 (MB) / 1024 (GB) * 8 (Gbps) = 17.9 Gbps
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/burnin-r0/testset# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 102.841927 secs (5220350565 bytes/sec)
[root@freenas] /mnt/burnin-r0/testset#
5,220,350,565 / 1024 (KB) / 1024 (MB) / 1024 (GB) * 8 (Gbps) = 38.894 Gbps
Notes: cpu sub/at 30% write, sub 10% read
Ok, so fast. But, it's a huge stripe, so it should be fast. Now for the same with the volume rebuilt using the s3500 for SLOG and the 6x M4's for l2arc, but still a 25 drive stripe:
* 1x s2500 ssd slog
* 6x m4 SSD l2arch
* 25x 3TB Hitachi
* 25 drive stripe
***write speed:***
# dd if=/dev/zero of=temp.dat bs=1024k count=500k
[root@freenas] /mnt/burnin2-r0/testset# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 235.378452 secs (2280883859 bytes/sec)
2,280,883,859 = 16.99Gbps
^^ cpu @ 35%
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/burnin2-r0/testset# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 103.162683 secs (5204119322 bytes/sec)
[root@freenas] /mnt/burnin2-r0/testset#
5,204,119,322 = 38.77Gbps
^^ < 10% cpu
Ok, slightly less CPU and a touch more speed without the ssd slog and ssd l2arc. Is the DD workload so basic that it doesn't work out the slog (ie: does it require sync writes?) and maybe my file was too small to hit the l2arc?
Now, on to the 24 (not 25) raidZ2 sets, 4 of them. each with 6 drives (6x4=24) each set having 4 drives equivalent for data and 2 for parity. Again, the first pass without any SSD for slog or l2arc:
* 24x 3TB Hitachi
* 6 drive RaidZ2 x 4
***write speed:***
# dd if=/dev/zero of=temp.dat bs=1024k count=500k
[root@freenas] /mnt/rZ2testvol/testdata# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 234.426093 secs (2290149981 bytes/sec)
[root@freenas] /mnt/rZ2testvol/testdata#
2,290,149,981 = 17.06Gbps
^^ cpu closer to 37% cpu
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/rZ2testvol/testdata# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 102.800031 secs (5222478117 bytes/sec)
[root@freenas] /mnt/rZ2testvol/testdata#
5,222,478,117 = 38.91Gbps
Hmm, I thought it was going to be much worse.
And now the 4x 6-drive raidZ2 but with SSD for slog and l2arc:
* 1x s2500 ssd slog
* 6x m4 SSD l2arch
* 24x 3TB Hitachi
* 6 drive RaidZ2 x 4
***write speed:***
# dd if=/dev/zero of=temp.dat bs=1024k count=500k
[root@freenas] /mnt/rZ2testvol2/testdata# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 231.970939 secs (2314388667 bytes/sec)
[root@freenas] /mnt/rZ2testvol2/testdata#
2,314,388,667 = 17.24Gbps
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/rZ2testvol2/testdata# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 102.828751 secs (5221019475 bytes/sec)
[root@freenas] /mnt/rZ2testvol2/testdata#
5,221,019,475 = 38.90Gbps
Here the performance was equivalent (but not lower, unlike striping) when adding the ssd-slog and l2arc.
Interestingly, the most significant thing i did observe as a delta between the 25 drive stripe and the four 6 drive raidZ2's is this:
individual drive "disk i/o (da_)"
* about 125KB/s when when the 25 disk stripe/raid0 was used (regardless of ssds or not)
* about 350KB/s when the 4x 6-drive RaidZ2 vdevs were used (regardless of ssds or not)
So the actual Hitachis seem to be doing a lot more work when the vdev is raidZ2 vs. striped. Does raidZ2 like raid5 or 6 need to read before write as well?
Attached are pics showing the slog's utilization during a write test, like above but with a 5.3TB file. And a shot of the graph showing the CPU side by side over time, raid0/strip on the left, raidZ2 on the write, both writes (read-activity in the middle)
is DD single threaded and i'm core-bound so not able to show the limits of the raidZ2 vs. stripe with 25/24 disks? Why is raidZ2 so close (not really complaining, just surprised to be honest).
.

I could use some help understanding what's going on. I am not sure if:
1) the metrics (Reporting > Disk) has a gui bug that is not showing disk reads
2) the common knowledge of "striping = less cpu than raidZ2" is not so true
2) the common knowledge of "striping = a lot more performance than raidZ2" is not so true
So, the kit:
* FreeNAS-9.2.1.7-RELEASE-x64
* head unit: Dell R710 (2u rack)
* cpu: dual socket each with a 4 core L5520 @ 2.27GHz
* 48GB ECC
* perc was removed
* m105 reflashed in IT mode to act as 9200i for A/B sas 8087 to dell backplane (bios kept so can serve as boot hba)
* 8x 2.5" drive chassis
* 1x 2.5" 160GB intel s3500 for OS/boot
* 1x 2.5" 160GB intel s3500 for SLOG (not always used, see below)
* 6x 2.5" 500GB crucial M4 for l2arc (not always used, see below)
* onboard quad gigE broadcom
* drac ent.
* dual powersupply
* Chelsio dual port 10Gbps SFP+ SR nic
* LSI 9200e with dual 8088 out the back, reflashed to IT mode, no boot bios
* dual sas cables to a 4U supermicro chassis, the 45 drive one (24 in the front and 21 bays in the back) SAS2 internals, dual powersupplies
* 25x 3TB Hitachi SAS drives
My goal is to make an NFS filer that can receive 10Gbps of backups target throughput. The Chelsio may be connected in active/passive to 2 different 10G switches (no lag/lacp possible as they are not in a stack). I'm still working on the vlanning but got the new supermicro jbod racked this weekend and wanted to play around with vols and configs. Here is what I found:
A) the cpu load on reads or writes in a 25 disk stripe (one huge raid0, @ 66.88TB) vs. four x 6 drive raidZ2 (ie: 4 data drives + 2 parity drives x 4 of these = 24 drives @ 42.8 TB) was negligible
B) the performance on reads or writes in a 25 disk stripe (one huge raid0, @ 66.88TB) vs. four x 6 drive raidZ2 (ie: 4 data drives + 2 parity drives x 4 of these = 24 drives @ 42.8 TB) was surprisingly close
So, not sure if the size of the "who would build a stripe that big" stripe made it not realistic, or what. And I'm thinking the first item, that the gui must not be reporting reads, as I did dd with a 5TB file size at the end and got no disk hits doing reads (on this or any test)only writes, and since 5TB can't fit in a 45GB arc, it must be a gui bug...
anyways, here are some results, between each set of tests i destroyed the volumes and recreated with the drive configs noted below and all used a sparse 10TB folder rebuilt each time:
First Set of Tests - no SSDs used, other than the boot SSD, all 25 Hitachi's in a big stripe:
* no extra slog
* no l2arc ssds
* 25x 3TB Hitachi
* 25 drive stripe
***write speed:***
[root@freenas] /mnt/burnin-r0/testset# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 223.346941 secs (2403753146 bytes/sec)
[root@freenas] /mnt/burnin-r0/testset#
2,403,753,146 / 1024 (KB) / 1024 (MB) / 1024 (GB) * 8 (Gbps) = 17.9 Gbps
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/burnin-r0/testset# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 102.841927 secs (5220350565 bytes/sec)
[root@freenas] /mnt/burnin-r0/testset#
5,220,350,565 / 1024 (KB) / 1024 (MB) / 1024 (GB) * 8 (Gbps) = 38.894 Gbps
Notes: cpu sub/at 30% write, sub 10% read
Ok, so fast. But, it's a huge stripe, so it should be fast. Now for the same with the volume rebuilt using the s3500 for SLOG and the 6x M4's for l2arc, but still a 25 drive stripe:
* 1x s2500 ssd slog
* 6x m4 SSD l2arch
* 25x 3TB Hitachi
* 25 drive stripe
***write speed:***
# dd if=/dev/zero of=temp.dat bs=1024k count=500k
[root@freenas] /mnt/burnin2-r0/testset# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 235.378452 secs (2280883859 bytes/sec)
2,280,883,859 = 16.99Gbps
^^ cpu @ 35%
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/burnin2-r0/testset# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 103.162683 secs (5204119322 bytes/sec)
[root@freenas] /mnt/burnin2-r0/testset#
5,204,119,322 = 38.77Gbps
^^ < 10% cpu
Ok, slightly less CPU and a touch more speed without the ssd slog and ssd l2arc. Is the DD workload so basic that it doesn't work out the slog (ie: does it require sync writes?) and maybe my file was too small to hit the l2arc?
Now, on to the 24 (not 25) raidZ2 sets, 4 of them. each with 6 drives (6x4=24) each set having 4 drives equivalent for data and 2 for parity. Again, the first pass without any SSD for slog or l2arc:
* 24x 3TB Hitachi
* 6 drive RaidZ2 x 4
***write speed:***
# dd if=/dev/zero of=temp.dat bs=1024k count=500k
[root@freenas] /mnt/rZ2testvol/testdata# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 234.426093 secs (2290149981 bytes/sec)
[root@freenas] /mnt/rZ2testvol/testdata#
2,290,149,981 = 17.06Gbps
^^ cpu closer to 37% cpu
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/rZ2testvol/testdata# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 102.800031 secs (5222478117 bytes/sec)
[root@freenas] /mnt/rZ2testvol/testdata#
5,222,478,117 = 38.91Gbps
Hmm, I thought it was going to be much worse.
And now the 4x 6-drive raidZ2 but with SSD for slog and l2arc:
* 1x s2500 ssd slog
* 6x m4 SSD l2arch
* 24x 3TB Hitachi
* 6 drive RaidZ2 x 4
***write speed:***
# dd if=/dev/zero of=temp.dat bs=1024k count=500k
[root@freenas] /mnt/rZ2testvol2/testdata# dd if=/dev/zero of=temp.dat bs=1024k count=500k
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 231.970939 secs (2314388667 bytes/sec)
[root@freenas] /mnt/rZ2testvol2/testdata#
2,314,388,667 = 17.24Gbps
***read speed:***
dd if=temp.dat of=/dev/null bs=1048576
[root@freenas] /mnt/rZ2testvol2/testdata# dd if=temp.dat of=/dev/null bs=1048576
512000+0 records in
512000+0 records out
536870912000 bytes transferred in 102.828751 secs (5221019475 bytes/sec)
[root@freenas] /mnt/rZ2testvol2/testdata#
5,221,019,475 = 38.90Gbps
Here the performance was equivalent (but not lower, unlike striping) when adding the ssd-slog and l2arc.
Interestingly, the most significant thing i did observe as a delta between the 25 drive stripe and the four 6 drive raidZ2's is this:
individual drive "disk i/o (da_)"
* about 125KB/s when when the 25 disk stripe/raid0 was used (regardless of ssds or not)
* about 350KB/s when the 4x 6-drive RaidZ2 vdevs were used (regardless of ssds or not)
So the actual Hitachis seem to be doing a lot more work when the vdev is raidZ2 vs. striped. Does raidZ2 like raid5 or 6 need to read before write as well?
Attached are pics showing the slog's utilization during a write test, like above but with a 5.3TB file. And a shot of the graph showing the CPU side by side over time, raid0/strip on the left, raidZ2 on the write, both writes (read-activity in the middle)
is DD single threaded and i'm core-bound so not able to show the limits of the raidZ2 vs. stripe with 25/24 disks? Why is raidZ2 so close (not really complaining, just surprised to be honest).
.

