In an effort to test the performance deltas between controllers, I've finally had a chance to sit and dive a bit deeper into `fio` and so far, I've gotten results that make me question if I am running these tests at all half-decently or not, so I thought it reasonable to ask here before I invest more time.
I've started running a several test cases (inspired by JRS articles on Ars Technica and some other sites I deemed useful) on an SSD in the ZFS pool, but the numbers were so surprisingly low that I wanted to see how a decent, single NVMe drive would behave.
For this test, I have a Sabertooth X99 with i7-5820k and a Samsung 980 PRO 1TB connected via M.2 slot - it's a PCIe Gen3 x4 connection. The disk is in steady-state with roughly 240 GB free space. OS is Windows 10, latest stable build.
I'll be posting the test cases
with full command line input as I'd like to know if I am making a mistake somewhere or the numbers are really supposed to be this low. Also, I was doing manual trimming after each test (with "Optimise Drives" app built in Windows).
Code:
1. [Single 4KiB random write process] fio.exe --name=random-write --ioengine=windowsaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
2. [16 parallel 64KiB random write processes] fio --name=random-write --ioengine=windowsaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
3. [Single 1MiB random write process] fio --name=random-write --ioengine=windowsaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
4. [Random Read/Write Operation Test - 75/25] fio --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=75
5. [Random Read/Write Operation Test - 50/50] fio --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=50
6. [Random Read/Write Operation Test - 25/75] fio --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=25
And here are the matching results for each run (condensed output).
Code:
1. write: IOPS=43.9k, BW=171MiB/s (180MB/s)(12.1GiB/72082msec); 0 zone resets
WRITE: bw=171MiB/s (180MB/s), 171MiB/s-171MiB/s (180MB/s-180MB/s), io=12.1GiB (13.0GB), run=72082-72082msec
2. IOPS => (min 3.8k - max 9 k) => avg ~6 k
WRITE: bw=6443MiB/s (6756MB/s), 239MiB/s-558MiB/s (250MB/s-586MB/s), io=387GiB (416GB), run=61128-61569msec
3. write: IOPS=285, BW=286MiB/s (300MB/s)(16.8GiB/60303msec); 0 zone resets
WRITE: bw=286MiB/s (300MB/s), 286MiB/s-286MiB/s (300MB/s-300MB/s), io=16.8GiB (18.1GB), run=60303-60303msec
4. read: IOPS=30.2k, BW=118MiB/s (124MB/s)(6141MiB/52134msec)
write: IOPS=10.1k, BW=39.3MiB/s (41.2MB/s)(2051MiB/52134msec); 0 zone resets
READ: bw=118MiB/s (124MB/s), 118MiB/s-118MiB/s (124MB/s-124MB/s), io=6141MiB (6440MB), run=52134-52134msec
WRITE: bw=39.3MiB/s (41.2MB/s), 39.3MiB/s-39.3MiB/s (41.2MB/s-41.2MB/s), io=2051MiB (2150MB), run=52134-52134msec
5. read: IOPS=22.7k, BW=88.5MiB/s (92.8MB/s)(4098MiB/46285msec)
write: IOPS=22.6k, BW=88.4MiB/s (92.7MB/s)(4094MiB/46285msec); 0 zone resets
READ: bw=88.5MiB/s (92.8MB/s), 88.5MiB/s-88.5MiB/s (92.8MB/s-92.8MB/s), io=4098MiB (4298MB), run=46285-46285msec
WRITE: bw=88.4MiB/s (92.7MB/s), 88.4MiB/s-88.4MiB/s (92.7MB/s-92.7MB/s), io=4094MiB (4292MB), run=46285-46285msec
6. read: IOPS=9892, BW=38.6MiB/s (40.5MB/s)(2050MiB/53059msec)
write: IOPS=29.6k, BW=116MiB/s (121MB/s)(6142MiB/53059msec); 0 zone resets
READ: bw=38.6MiB/s (40.5MB/s), 38.6MiB/s-38.6MiB/s (40.5MB/s-40.5MB/s), io=2050MiB (2150MB), run=53059-53059msec
WRITE: bw=116MiB/s (121MB/s), 116MiB/s-116MiB/s (121MB/s-121MB/s), io=6142MiB (6440MB), run=53059-53059msec
The reason I am suspecting misconfiguration here is because the IOPS are so far off the stated specification that either Samsung is criminally inflating their spec or my test cases are completely flawed. The 980 PRO has a stated "up-to 1 MOPS" for read and write random ops - The highest IOPS I've seen is around 43 k IOPS, which is not even 5% of the stated possible maximum, so surely I am doing something wrong?
On the other hand, if these tests are representative of actual physical disk capabilities (and the numbers Samsung is stating are just a brazen, marketing lie), then I question the purpose of even considering anything better than LSI 2008 in an all-flash pool of consumer SSDs. If the numbers are so meagre with a single NVMe that seems to be state-of-the-art consumer SSD, then what hope is there of ever reaching the stated limit of 300 k IOPS that the LSI 2008 is supposed to be capable of delivering with an array of SATA SSDs?
Is what I am seeing is perhaps the limitation of the onboard HBA I am using on X99?
Lastly, if my tests are flawed and the stated IOPS numbers
can be reached, how do I do that?