Slow drive pool performance 18x12TB vdev1 18x18TB vdev2

Bobka

Cadet
Joined
Jan 6, 2023
Messages
9
I have a slow Read and Write performance of the Truenas server. Below is my hardware setup:

SUPERMICRO 6049P-ECR36H includes board X11DPH-T
64GB of RAM
Model: CSE-847BE1C4-R1K23LPB
SUPERMICRO 6049P-ECR36H includes board X11DPH-T
RAM 64GB
CPU x2Intel Xeon Silver 4110 SR3GH 8-Core 2.1GHz
Dual 10GBase-T LAN ports with Intel X722 + PHY Intel X557
AOC-S3008L-L8E – SAS controller
Local drive: nvme 256GB drive
Vdev1 18x12TB drives in Raid Z2 WD enterprise level drives (STORAGE CAPACITY 175TB)
Vdev2 18x18TB drives in RaidZ2 Exos X18 Seagate (261TB)

Current performance:
Vdev1: R: 170 MB/s W: 307MB/s
Vdev2: R: 160 MB/s W: 280 MB/s

Testing:
I have tested the performance by moving large files from vdev1 and vdev2 pools. I also tested performance moving large files between server and workstation, my workstation is connected with 10G switch. The results were similar.

Resource consumption:
My CPU usage never goes beyond 20% in any of the tests. RAM I always have more then 25GB of RAM available.

What I am looking for:
I need a good balance between drive pool capacity, performance, and reliability. I went with RAIDZ2 because 2 drives out of 18 can fail before I loose data, I feel this is a good level of redundancy for my system. I started with vdev1 pool, and instead of expanding it I created a second pool vdev2 because I didn't want to mix the drive sizes. I tested a single drive by writing large files to it and I was getting about 200MB/s write speeds. Since I have RAID6 of 18 drives my expected performance was that i would get triple that speed. But that is not the case. Also write speed is almost double the read speed doesn't make much sense to me.

Questions:
1. How do I properly bench test the system?
2. Is Z2 for 18 drives a correct setup? Is there a better setup with similar level of redundancy by better performance?
3. What can i do to increase the Read and Write performance? IO is not so critical because this is a storage server with max 3 users that move data back and forth.
4. I believe I have a lot of computer resources that i not being fully utilized (CPU, RAM) is there anything that can be done to have high resource utilization with boosted R/W pool performance?

I am new to TrueNAS and I would be very grateful for any help on troubleshooting and proper setup.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
2. Is Z2 for 18 drives a correct setup? Is there a better setup with similar level of redundancy by better performance?
nope. 2x 9 drive raidz2, or at least raidz3, though your speed will still mostly suck
Vdev1: R: 170 MB/s W: 307MB/s
Vdev2: R: 160 MB/s W: 280 MB/s
looks pretty awesome for such a sub optimal raidz2 topology.

such a large vdev is basically only usable for a backup pool.


raidz2 performance mostly is the performance of a single drive. so your 18 drives will perform at the speed of the slowest drive. aka 150MB/s. you get some benefits with single streams, but random IO will be horrible.


you should read the documentation. the best performance is mirrors, particular for read heavy ops (well, technically stripe but that's silly), the best storage efficiency is raidz. everything is a trade off.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
My CPU usage never goes beyond 20% in any of the tests. RAM I always have more then 25GB of RAM available.
nothing in your current stated config will push this system at all. you would need something like all SSD's and VM's to even start doing so.
3. What can i do to increase the Read and Write performance? IO is not so critical because this is a storage server with max 3 users that move data back and forth.
more vdevs. ie 2x raidz2. or mirrors
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
1. How do I properly bench test the system?
fio or jgreco's solnet array (in the resource section).

2. Is Z2 for 18 drives a correct setup? Is there a better setup with similar level of redundancy by better performance?
Users are discouraged to use vdevs larger than 12 disks. @artlessknave suggestion is great: 2x9 in RAIDZ2 looks solid. I would also consider 3x6 in RAIDZ2.
Also the size of your files impacts speed more than one might think (lots of smoller files means more IOPS needs).

4. I believe I have a lot of computer resources that i not being fully utilized (CPU, RAM) is there anything that can be done to have high resource utilization with boosted R/W pool performance?
SMB uses single threads, afaik NFS should be multi thread. That can potentially increase your resource utilization.
You can check your ARC with arc_summary, but it will grow with time.

Please read the following resource:
 
Last edited:

Bobka

Cadet
Joined
Jan 6, 2023
Messages
9
artlessknave, thank you for meaningful insights and taking your time to respond. I am learning a lot. I did some math based on what I have learned from this article (https://www.truenas.com/blog/zfs-pool-performance-2/)

Current Setup: 1x18 wide Z2
  • Read IOPS: Read IOPS of single drive (250)
  • Write IOPS: Write IOPS of single drive (250)
  • Streaming read speed: 18-2=16*150MB/s = 2400MB/s
  • Streaming write speed: 18-2=16*150MB/s = 2400MB/s
  • Storage space efficiency: 88%
  • Fault tolerance: 2 disk per vdev
Proposed setup A: 2x9 wide Z2
  • Read IOPS: Read IOPS of single drive x 2 = 500
  • Write IOPS: Write IOPS of single drive x 2 = 500
  • Streaming read speed: (9-2)x2=14*150MB/s = 2100MB/s
  • Streaming write speed: (9-2)x2=149*150MB/s = 2100MB/s
  • Storage space efficiency: 77%
  • Fault tolerance: 2 disk per vdev
Proposed setup B: 1x18 wide Z3
  • Read IOPS: Read IOPS of single drive (250)
  • Write IOPS: Write IOPS of single drive (250)
  • Streaming read speed: 18-3=15*150MB/s = 2250MB/s
  • Streaming write speed: 18-3=15*150MB/s = 2250MB/s
  • Storage space efficiency: 83%
  • Fault tolerance: 3 disk per vdev
What is wrong with my math?

1. I get about 10x less my theoretical calculations under current setup.
2. Your suggestion to split vdev into 2x striped Z2 arrays based on my math would double IO, at expense of capacity and R/W based on my theoretical (excluding IO) would actually decrease.
3. Your suggestion to switch to raidz3 based on my math would increase fault tolerance at expense of capacity and R/W would also decrease.

I completely agree with you on the IO which is equal to a single drive in my current setup. But the streaming speed should be at least double.
Also what performance do i gain performance with 1x18 wide Z3? I just need high streaming speeds to move large files around with max 3 users.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
the problem with any RAIDz vDEV that's really wide is that rebuilding starts to take forever and your chance of the whole vDEV failing is multiplied by every drive. it's the same reason raidz1 is not recommended, the risk of enough drives failing while trying to rebuild goes way up. this is why 2x raidz2, or at least raidz3, to reduce your risk to a manageable level.

you also are using 12TB+ drives, significantly increasing this risk as each re-silver is going to be hammering the drives for up to 12TB of reads on every single disk, calculating the parity and data, and then writing it to the new disk. hell, the chances of the new disk failing are pretty good (unless you burn it in first).
3x 6 raidz2 would be better, but 9 drives is -ok-. this is why I said raidz2 x18 would be -ok- for a backup server.

of course, if you don't care about the data, go nuts, but we in the forums hate lost data. it's our nemesis, the advice here is going to tend towards reduced risk over everything else, or at the least reduced enough to not be at the dangerous edge. I saw a user this week who nuked 1/2 their pool and was asking how to "move the data" off of it. the answer was: you don't, you nuked it, its gone (unless you are willing to pay hundreds of thousands to maybe restore some of it via professionals).

as to speeds, math is not my strong point; I can understand the concepts but calculating out the numbers to prove them is a struggle.
to my knowledge, steaming speeds near theoretical would basically require writing a hundred TB as one file. every time it changes files would reset the "Stream". 300MB/s write is pretty good from what I understand, when using spinners. it's a limitation of the tech, you are sacrificing speed for capacity. your small, random, and current IO will suck in comparison.

you will see more speed for anything in ARC (read cache), since that will be provided directly from RAM. you dont really have enough RAM to bother with L2ARC, but adding more RAM will increase your ARC.
you could see improvements by adding metadata SSDs, they will store the checksums and such instead of putting that on the spinners. this can help. remember, however, that this makes them a permanent part of the pool, and they need to be reliable and redunant, or they will be what kills the pool if they die.

in reality, mirrors will probably end up out performing the raidz2 steam speeds anyway and perform better in every other way as well.
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
artlessknave, thank you for meaningful insights and taking your time to respond. I am learning a lot. I did some math based on what I have learned from this article (https://www.truenas.com/blog/zfs-pool-performance-2/)

Current Setup: 1x18 wide Z2
  • Read IOPS: Read IOPS of single drive (250)
  • Write IOPS: Write IOPS of single drive (250)
  • Streaming read speed: 18-2=16*150MB/s = 2400MB/s
  • Streaming write speed: 18-2=16*150MB/s = 2400MB/s
  • Storage space efficiency: 88%
  • Fault tolerance: 2 disk per vdev
Proposed setup A: 2x9 wide Z2
  • Read IOPS: Read IOPS of single drive x 2 = 500
  • Write IOPS: Write IOPS of single drive x 2 = 500
  • Streaming read speed: (9-2)x2=14*150MB/s = 2100MB/s
  • Streaming write speed: (9-2)x2=149*150MB/s = 2100MB/s
  • Storage space efficiency: 77%
  • Fault tolerance: 2 disk per vdev
Proposed setup B: 1x18 wide Z3
  • Read IOPS: Read IOPS of single drive (250)
  • Write IOPS: Write IOPS of single drive (250)
  • Streaming read speed: 18-3=15*150MB/s = 2250MB/s
  • Streaming write speed: 18-3=15*150MB/s = 2250MB/s
  • Storage space efficiency: 83%
  • Fault tolerance: 3 disk per vdev
What is wrong with my math?
Your math is right, but math doesn't translate to real use because...

1. I get about 10x less my theoretical calculations under current setup.
...IOPS matter more than you think, and you have too many drives in a single vdev.

Also what performance do i gain performance with 1x18 wide Z3? I just need high streaming speeds to move large files around with max 3 users.
Nothing, because vdevs larger than 12 struggle. I'm convinced that using a single 12-wide vdev in RAIDZ2 you will see an increase of performance even though the math would say the opposite: it's difficult to coordinate that many spinners!
 
Last edited:

Bobka

Cadet
Joined
Jan 6, 2023
Messages
9
artlessknave and Davvo, I am very impressed with your knowledge and willingness to help. Thank you.

Everything you mentioned makes perfect sense.

you made two suggestions:
1. Adding more RAM: How much RAM would you recommend and what kind of impact would this do?
2. Meta Data Drives: I've maxed out drive bay on the 36bay Supermicro rig, I have left 2 SSD slots and 1 m.2 nvme slot. I think max i can do is to add two SATA SSD drives and mirror them, but this would only give me one drive fault tolerance where with my current setup i have 2 drive fault tolerance. Would this be something you would recommend?

I am calculating performance cost and overall cost per TB. Here is my break down of costs. Based on my calculations for 5% more cost I can gain 50% performance increase with Option2. Would love your feedback on Option 3 and 4.

SUPER MICRO SERVER
1​
$ 1,820.00$ 1,820.00
HARD DRIVES# OF DRIVES$/DRIVE
12 TB VDEV1
18​
$ 140.00$ 2,520.00
18 TB VDEV2
18​
$ 300.00$ 5,400.00
TOTAL$ 9,740.00
OPTION 12 VDEV 18 WIDE Z2
VDEV1 Z2
175​
TB
VDEV2 Z2
261​
TB
TOTAL SIZE
436​
TB
FAULT TOLLERENCE 4 DRIVES
$ FOR FAULT TOLLERENCE$ 880.00
9%​
OF OVERALL SYSTEM COST
COST PER TB$ 22.34
OPTION 24 VDEV 9 WIDE Z2
VDEV1 Z2
76.559​
TB(12TB DRIVES)
VDEV2 Z2
76.559​
TB(12TB DRIVES)
VDEV3 Z2
114.1​
TB(18TB DRIVES)
VDEV4 Z2
116.9​
TB(18TB DRIVES)
TOTAL SIZE
384.118​
TB
FAULT TOLLERENCE 8 DRIVES
$ FOR FAULT TOLLERENCE$ 1,760.00
18%​
OF OVERALL SYSTEM COST
COST PER TB$ 25.36
PROSPERFORMANCE DOUBLES
CONSCOST INCREASE$ 1,159.02
14%​
OPTION 3META DATA DRIVES
SATA 870 EVE 4TB DRIVES
2​
$ 376.00$ 752.00
PROSPERFORMANCE INCREASES BY HOW MUCH?
CONSCOST INCREASE$ 752.00
8%​
ONE DRIVE FAULT TOLERANCE DOESN’T MATCH VDEV POOL
OPTION 4RAMMAX 192GB ADDITIONAL
CURRENTLY HAVE 64GB
192GB RAM2X16
91​
$ 546.00
TOTAL RAM
256​
PROSPERFORMANCE INCREASE BY HOW MUCH?
CONSCOST INCREASE$ 546.00
6%​
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
1. Adding more RAM: How much RAM would you recommend and what kind of impact would this do?
Adding more RAM would help your read speeds while the files you want to read are in ther ARC. how big of files are we talking? There is also the option of L2ARC for metadata (that would work great with that free PCIe slot).

Would this be something you would recommend?
No, you want the same level of redundancy on all vdevs.

All in all I consider the option 2 the best compromise (you could also do 3 vdevs 12-wide in RAIDZ2). Do note that the usable space is 80% of total space since you want to leave about 20% free in order to not see your performance die a gruesome death; you might want to take that in consideration doing the price per TB math.
Also, do note that usually it's advised to have symmetrical (both in layout and size) vdevs.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
that is a shipload of drives.
how are you planning to back up this monster? because RAIDz is not a backup....
also, it should be added that you can add vdevs later. so you could start with 1 raidz2 of 9 drives, and add another full raidz2 when you actually need the space. raidz2 expansion is both complex and simple at the same time, but as long as you have a plan, it's doable.
I probably would discourage meta drives unless at least 3. also, you can't really remove meta drives, they become a permanent part of the pool.
 

Bobka

Cadet
Joined
Jan 6, 2023
Messages
9
All in all I consider the option 2 the best compromise (you could also do 3 vdevs 12-wide in RAIDZ2).
This may not be a good option because I have vdev1 of 18 drives with 12TB capacity and vdev2 18 drives with 18TB capacity. So going with 3 vdevs would force me to mix and match which i didn't want to do. So I agree going with 4 vdevs (2 for each 18 drives set) z2 would be a good option.
Do note that the usable space is 80% of total space since you want to leave about 20% free in order to not see your performance die a gruesome death; you might want to take that in consideration doing the price per TB math.
Also, do note that usually it's advised to have symmetrical (both in layout and size) vdevs.
I completely agree. I filled up first vdev already to 90% and performace died, so i had to offload it to the second newly created vdev.
 

Bobka

Cadet
Joined
Jan 6, 2023
Messages
9
that is a shipload of drives.
how are you planning to back up this monster? because RAIDz is not a backup....
This is a great questions. I jumped into this project and have been swamped and have been learning and building things as i go. I didn't have a chance to think about the road map and properly lay things out. My current plan is after I am done with the project, compress the data, and perhaps consolidated it to vdev1. Then data vdev2 as its currently setup and follow your suguestions for option2 above. As far as the backup, I have not done research on this part. I am not sure how to back up 200TB of data. Any sugestions would be greatly appreciated.
also, it should be added that you can add vdevs later. so you could start with 1 raidz2 of 9 drives, and add another full raidz2 when you actually need the space. raidz2 expansion is both complex and simple at the same time, but as long as you have a plan, it's doable.
This is a great point. I love it. At the moment I already dumped data on my current vdev2 setup, and moving it off would slow me down with the project. So i just have to live with what i have for now. And reconfigure things after my project is done.
I probably would discourage meta drives unless at least 3. also, you can't really remove meta drives, they become a permanent part of the pool.
Makes perfect sense.
 

Bobka

Cadet
Joined
Jan 6, 2023
Messages
9
Davvo. Regarding your questions: Adding more RAM would help your read speeds while the files you want to read are in ther ARC. how big of files are we talking? There is also the option of L2ARC for metadata (that would work great with that free PCIe slot).

The file size varies, majority of the files are >20MB<35GB. They are not frequently accessed, once written to the drive they are aceesed maybe 10 times, after that archived.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
The file size varies, majority of the files are >20MB<35GB. They are not frequently accessed, once written to the drive they are aceesed maybe 10 times, after that archived.
I don't think that ARC will be of much help then, maybe L2ARC for metadata only could marginally help. Well, there is no need to rush things, you can first fix the pool layout and then reassess the performance.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
can you give us some idea what "THE PROJECT" is? curious what the hell needs 200TB.

the "Best" way to backup 200TB is to have a 200TB clone server and replicate to it.....

otherwise. um. tape, i guess? looks like LTO-12 you allow single tape backup...but, honestly, all the infrastructure for that would probably come close to just making a new server. there are also things like DataDomain but they are proprietary. if you found a decommed one maybe...

(I am genuinly a backup admin, mostly netbackup/datadomain/flex)
 
Top