Hi,
I'm trying to make some benchmark tests.
I start by some synthetic ones locally on the pool
Plateform summary
Tested pool summary & layout
(This "no-protection" layout are here only for doing few benchs)
I could make more sophisticated layouts with more SSD in a strip-mirror fashion.
But i'm not at the stage to benchmark the pool itself
PS: i have 12 of them connected to a LSI 9300-8e (in a JBOD chassis). Theses links may be a bottleneck in high activity but not when benchmarking the ARC like the following.
First Bench : IO using only CPU + RAM
First of all, i want to figure how the internal I/O stack would works only in RAM
It's probably not really a bench of the RAM but whatever it measure, it's the best throughput the host can serve !
That give around 24.15 GB /s
So, i can confirm that this bench don't really measure the DDR4 since a 2133 speed are mostly to 17GB/s
But let's go to the next step and consider this one only say that the IO stack is good enought !
Second Bench : IO using read process fully on ARC
The second bench, i want to check the behaviour of the ARC cache in term of performance.
I'm trying to start by benchmarking an overall throughput on a single read process.
First of all, i generate a file of 6GB and read it a first time to be ensure that the file is entirely up on the ARC
I launch in a first term an iostat for check to pool activity and ensure i did'nt read data from him
In a second term, i launch this read command:
That give around 1.39GB/s
I expected so much better result. To be honest nowaday we have some disks that can largely reach this throughput directly (nvme drives,...)
So i digged a bit with theses tunables
In fact, it helps a bit and reach an average of 1.65GB/s (with some peeks to 2GB/s)
(but multiple benchs gives variables results so it's hard to say with precision how theses tunables helps or not)
For me theses results are pretty bad, but i would like to know what is the expected result for you and can we consider them as normal ?
If yes, that means even in ARC, you can't expect to reach a 40Gb/s network (not at QD1)
I didn't test yet a concurrent version of this test for see if the ARC scale well or not when we have more concurrents requests.
I'm at the beginning of a road for optimize and bench my freenas box in a 40Gb/s perspective.
And all advise are welcome !!
Best regards,
Sébastien.
I'm trying to make some benchmark tests.
I start by some synthetic ones locally on the pool
Plateform summary
- Motherboard SuperMicro X10DRH-CT
- 2x Xeon E5 V3 @ 2.9Ghz (total of 32 threads)
- 64 Gb RAM (DDR4 ECC 2133)
- Chelsio T580-CR network card (40Gb/s)
- FreeNAS-11.2-U5
Tested pool summary & layout
- 4x SSD HGST SAS3 of 800GB
- Pool and dataset with compression off (mainly because i will use /dev/zero and it will fail all benchs)
Code:
pool: volSSD state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM volSSD ONLINE 0 0 0 gptid/b58f204e-a65b-11e9-a6ec-ac1f6b05d3c2 ONLINE 0 0 0 gptid/b6b2b2cc-a65b-11e9-a6ec-ac1f6b05d3c2 ONLINE 0 0 0 gptid/b7ee18b6-a65b-11e9-a6ec-ac1f6b05d3c2 ONLINE 0 0 0 gptid/b923c7d1-a65b-11e9-a6ec-ac1f6b05d3c2 ONLINE 0 0 0
(This "no-protection" layout are here only for doing few benchs)
I could make more sophisticated layouts with more SSD in a strip-mirror fashion.
But i'm not at the stage to benchmark the pool itself
PS: i have 12 of them connected to a LSI 9300-8e (in a JBOD chassis). Theses links may be a bottleneck in high activity but not when benchmarking the ARC like the following.
First Bench : IO using only CPU + RAM
First of all, i want to figure how the internal I/O stack would works only in RAM
It's probably not really a bench of the RAM but whatever it measure, it's the best throughput the host can serve !
Code:
dd bs=128k if=/dev/zero of=/dev/null count=409600 409600+0 records in 409600+0 records out 53687091200 bytes transferred in 2.070018 secs (25935567925 bytes/sec)
That give around 24.15 GB /s
So, i can confirm that this bench don't really measure the DDR4 since a 2133 speed are mostly to 17GB/s
But let's go to the next step and consider this one only say that the IO stack is good enought !
Second Bench : IO using read process fully on ARC
The second bench, i want to check the behaviour of the ARC cache in term of performance.
I'm trying to start by benchmarking an overall throughput on a single read process.
First of all, i generate a file of 6GB and read it a first time to be ensure that the file is entirely up on the ARC
I launch in a first term an iostat for check to pool activity and ensure i did'nt read data from him
Code:
capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- volSSD 6.00G 2.88T 0 0 0 0 volSSD 6.00G 2.88T 0 0 0 0 ... volSSD 6.00G 2.88T 0 0 0 0
In a second term, i launch this read command:
Code:
> dd bs=128k if=test6G of=/dev/null 49152+0 records in 49152+0 records out 6442450944 bytes transferred in 4.314335 secs (1493266321 bytes/sec)
That give around 1.39GB/s
I expected so much better result. To be honest nowaday we have some disks that can largely reach this throughput directly (nvme drives,...)
So i digged a bit with theses tunables
- Disabling atime ( zfs set atime=off volSSD )
- Disabling compressed arc ( vfs.zfs.compressed_arc_enabled=0 )
In fact, it helps a bit and reach an average of 1.65GB/s (with some peeks to 2GB/s)
(but multiple benchs gives variables results so it's hard to say with precision how theses tunables helps or not)
For me theses results are pretty bad, but i would like to know what is the expected result for you and can we consider them as normal ?
If yes, that means even in ARC, you can't expect to reach a 40Gb/s network (not at QD1)
I didn't test yet a concurrent version of this test for see if the ARC scale well or not when we have more concurrents requests.
I'm at the beginning of a road for optimize and bench my freenas box in a 40Gb/s perspective.
And all advise are welcome !!
Best regards,
Sébastien.
Last edited: