Write Performance Comparison between Core and SCALE

soleous

Dabbler
Joined
Apr 14, 2021
Messages
30
Full disclosure, I am virtualizing under ESXi with passthrough of LSI2008 cards in IT mode. I have 5 new Ultrastar He12 disks, which I'm migrating to using another LSI2008 card.

I have been testing a couple of things, including SCALE and I noticed slow performance compared with Core using sequential writes, where I would normally expect memory to be the bottleneck. The below results are using the same 5 disks in RAIDz2, same spec VM's with the same RAM (32GB), clean build, no load:

SCALE RAMdisk write performance:
Code:
dd if=/dev/zero of=TestingSpeed bs=1G count=20 && rm TestingSpeed
20+0 records in
20+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 10.0064 s, 2.1 GB/s


SCALE ZFS write performance:
Code:
dd if=/dev/zero of=TestingSpeed bs=1G count=20 && rm TestingSpeed
20+0 records in
20+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 14.1847 s, 1.5 GB/s


Debian VM RAMdisk for Comparison:
Code:
dd if=/dev/zero of=TestingSpeed bs=1G count=20 && rm TestingSpeed
20+0 records in
20+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 10.2462 s, 2.1 GB/s


Core RAMdisk write performance:
Code:
dd if=/dev/zero of=TestingSpeed bs=1G count=20 && rm TestingSpeed
20+0 records in
20+0 records out
21474836480 bytes transferred in 7.279682 secs (2949969040 bytes/sec)


Core ZFS write performance:
Code:
dd if=/dev/zero of=TestingSpeed bs=1G count=20 && rm TestingSpeed
20+0 records in
20+0 records out
21474836480 bytes transferred in 6.475183 secs (3316483142 bytes/sec)

I normally expect around the 2.6GB/s (8.255009 secs (2601431003 bytes/sec) such as my existing pool), however, I am unsure why SCALEs ZFS is dropping to 1.5GB/s. I get the impression this is Debian/OpenZFS compared with FreeBSD, but the Core performance is significantly better? Any Idea's?

Thanks,
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Hard to diagnose without a lot more telemetry. So far we (iX) have done little to no optimization on the performance side for SCALE. Expecting that to get started around BETA/RC timeframe, so I'd expect us to be making adjustments and tweaks as we gather that initial testing data.
 

soleous

Dabbler
Joined
Apr 14, 2021
Messages
30
Thanks for the quick response, TBH I'm waiting on Beta/RC to migrate over to SCALE. At this level and out of interest what type of telemetry would be needed?

I would really like to try this on the metal to validate virtualization, but that won't be possible for a while. Is anyone else experiencing these types of speeds?
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
Well, one thing that we'd normally ask for is a debug, just so we can validate all the settings first, driver level setup, etc. Next thing is stats from even something such as 'htop' would be helpful, since it may point to issues when taken during your test runs. I'd suggest opening a ticket on jira.ixsystems.com where we can review and discuss data, and we can get our perf folks engaged as well.
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
for me SCALE is slower than CORE on barebone in every metric
------------------------ SCALE ----------------------- CORE -------------
encryption=off [R W]: 700MBs 1126MBs 780MBs 1566MBs
encryption=on [R W]: 500MBs 800MB/s 780MBs 1116MBs
* worse raw ZFS speeds, both both don't achieve normal NVME read speeds
* big penalty for native encryption
* slower samba, very slow writes in both: SCALE [R W}: 500 200 CORE [R W]: 520 350
* slow RAIDZ on HDD, read speed = one drive speed unless given boost via prefetch tunable
* slower iperf
SCALE has better UI (finally can click on a table row instead of the tiny arrow!!!) and less "fundamentalist attitude", but tons of bugs. CORE more mature but can't connect my external drives which is a showstopper. Hard to choose any at this point.
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
@appliance - Thanks for the feedback. We're knee deep into our performance work on SCALE, we've started the deeper inspections and expecting to make some tweaks and tunings in the coming months which will bring performance up to or past CORE levels.
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
1652296802032.png

here's what i've learned by tripple execution of metrics above with proper cache flushing:
* ZFS itself kills NVME performance even with "good settings". direct reads are far from NVME max. The same box produced higher values in the past, FreeNAS 11, OMV5, DSM all read and wrote >1000MBs EVEN on (zfs) encrypted pools. Don't know what happened, because algorithms get faster in time, and indeed the openssl / cryptsetup benchmarks are now faster, while ZFS got slower (tho not to pathetic zfs0.83 levels). Maybe Linux maintainers still fighting ZFS?
* Samba cuts another chunk. With hundreds of tweaks, RSS, multichannel i didn't achieve any improvement. For single user, speeds won't saturate NIC except for DSM and CORE and is stuck at using just one thread each time launched for the specific user.
* Native encryption cuts another chunk, and bit more than expected (-20%, -30%, -40%), and it will multithread but won't saturate CPU fully. I've noticed DSM made the machine way noisier, but all the base benchmarks are the same and cpu scaling works. Even with some special hacks, like disable BIOS limits, i couldn't make the machine be aggressive and noisy like under DSM. Interesting. The only difference I know of is DSM was running in legacy instead of UEFI.
* when ARC cache is utilized (e.g. after repeated copy), it saturates NIC fully in single thread, no I/O happens

Also: SCALE is the least performant NAS tested historically, while CORE is strong, beating another Linux based distro Unraid and on par with old kernel DSM.
* Sad thing is Synology DSM, for some reason, is on the top despite file based encryption plus BTRFS. All others use ZFS. I have some old notes with DSM and OMV, the table display only the latest crop - same pools, same files used in one day.
* SCALE autotune is ineffective - it refers to BSD nonexistant settings. When added manually, the only value that "worked" was prefetch that normalized sad 4xHDD RAIDZ1 read value from 1*disk to 2*disk (200MBs->400MBs). Damn, shouldn't it be 3*disk? 4xHDD at single disk speed, that reminds me of this test, see glitch at 4x 4TB -> r=183MBs for SOME reason.

note: i'm capped by 6.5gbit NIC PCI3x1 (=800MBs) which impacts all numbers except raw metrics. CrystalDiskMark8 might not always work, e.g. the bogus SEQ1M values. But it shows in multi mode, the NIC gets saturated. CrystalDiskMark7 works even worse. Machine is HP Microserver Gen10 at 4000passmarks, AES enabled.

Resolution: SCALE should get to the CORE level of performance. I'm very interested in SMB single user read value "760" that CORE provides that distinguishes NVME speed from SSD.
 

Kris Moore

SVP of Engineering
Administrator
Moderator
iXsystems
Joined
Nov 12, 2015
Messages
1,471
This is a great breakdown, have you considered a career with us in performance? :)

But in seriousness, I appreciate this. The first couple releases of SCALE have be focused only on Features / Stability. Our perf team is now doing these same kinds of analysis and already identifying bottlenecks that will be addressed in upcoming SCALE releases. We've already identified some that fixing should bring SCALE up to CORE levels of performance, possibly even significantly beyond.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
One interesting thing about OpenZFS project, is that they use automated testing to help with regressions. Basically any new feature requires a test module, to test out the expected behavior of the new feature. When a new Pull request is made, it has to run through these tests.

Can such performance tests be setup and automated, for same configurations of CORE & SCALE?

In addition, other functionality tests for the command line API should be straight forward to implement.
 
Top