basement_tech
Cadet
- Joined
- Jul 29, 2020
- Messages
- 7
Hi All.
Long post, apologies up front. Trying to be as information rich so some good feedback can be provided.
New to Truenas and while I have used ZFS before (back in openindiana days) I am still pretty much a complete newb. Or, at least I know enough to be dangerous, but at least I recognize my limitations.
The backstory to using Freenas/Truenas that I have many (several hundred) of VM's to backup - basic tar & rsync stuff. To handle this I have 8 servers running vanilla centos with 8 sata drives on lsi raid cards that the backups are pushed to overnight.
The IO of the backups are killing the backup servers. CPU's are fine, it's the raid that's not able to keep up. We have tons of free space on the backup servers.
So, I thought (bad move - wife says I should never think) I could maybe re-purpose some spare SSD's, plus a big but slightly older dual e5 2690 v3 server with 384gb ram into a zfs box. ISCSI/NFS mount via 10gbps network to the existing backup servers and maybe we don't have to buy several more backup servers. I've been eyeing Freenas/Truenas for a while - what a great opportunity to play.
I build a beast. Or so I thought. 22 x 1.6tb cloudspeed ssd's, 2 x 32gb satadoms to install on, dual e5 2690 v3, 384gb ram, supermicro X10DRU-i+ motherboard, 24 port chassis, AOC-S3008L-8LE HBA in IT mode in x8 slot. Has 4 x 10gbaseT ports too, but I plan to go to 10g via SPF+ soon as parts arrive.
Memtest the beast for two days, no errors. Install Truenas 11.3-U4.1.
Create pool of 7 raidz1 vdev's with 1 hot spare. So, in my mind this is essentially a big raid 50, should offer some decent performance (to my limited zfs understanding, roughly the speed of 7 ssd when writing, potentially more when reading, plus whatever the arc cache from all that ram leads to). Feel free to tell me the logic flaw here, but I don't "think" the pool config is related to the issue I am facing.
Then I test it. In my test area, I only have 3 test servers (e3 1245 v3 xeons) and a gige network.
Run fio, with all 3 servers hooked up via iscsi. Each server is connecting via a different network interface on the Truenas server (used 3 of the 4 onboard 10gbaseT ports to get 3 x 1gbps ethernet). For a few seconds it looked like a success! But then, performance was all over the place. Completely dies out at times - to zero traffic, zero load, zero everything for 30 seconds... then suddenly traffic comes back and we have 3 servers pushing x 90+MB/s or basically saturating the gige ports. Thought perhaps it was the cheap netgear switch, threw in a dell switch and the exact same issue occurred. Tried some network tuning, few other little things, etc.
I had a bright idea, let's try to benchmark the pool on the Truenas server directly to take some variables out of the equation. After a little research, I do and the performance is bad, no matter what I run. But the worst is this fio test:
--
fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=testing --filename=randreadwrite.fio --bs=4k --iodepth=64 --size=10G --rw=randrw
--
Now, maybe the direct=1 and gtod_reduce=1 aren't really needed. I don't know. But with or without those flags, there was very little variation in my testing.
Run on the pool (/mnt/pool) I get the following:
--
mixed random read: 14800 iops 60.7MB/s
mixed random write: 14800 iops 60.6MB/s
--
But the biggest issue is that during the test, at times it would just... pause. Or display a really low throughput. Like so:
--
Jobs: 1 (f=1): [m(1)][19.7%][r=644KiB/s,w=592KiB/s][r=161,w=148 IOPS][eta 14m:03s]
--
I was running gstat as well and the drives were always at 85-105% busy even during the period of low IOPS. If the performance was consistent maybe I wouldn't be pulling my hair out. Maybe some cache was being hit and it was flushing to disk? I don't know. Seems off. Fio tests took 15 minutes to complete.
Thought perhaps it's the pool configuration, maybe a stripe of 7 raidz1 vdev's each having 3 ssd is not a good idea. Removed the hot spare drive from the one big pool I had created and created a pool of 1 drive with it and tested that pool with the same fio as above.
The results were not pleasing:
--
mixed random read: 1057 iops 4331kB/s
mixed random write: 1055 iops 4325B/s
--
Exact same issue with the inconsistent performance here too. Several minutes of no progress on the fio test and then suddenly things start moving again.
At this point I am thinking it has to be the HBA. I pull the hot spare. Plug into an e3 server, straight to the onboard sata controller. Install centos7. Run the same fio test just different engine (--ioengine=libaio vs --ioengine=posixaio because linux vs freebsd). Results are much more pleasing:
--
mixed random read: 22900 iops 93.9MB/s
mixed random write: 22900 iops 93.8MB/s
--
A single drive in an e3 xeon running centos 7 outperforms the entire Truenas zfs pool of 21 drives by a significant margin. Wow. Something feels wrong.
Pulled all the ssd drives from the Truenas box except 1. Install centos7 on the remaining ssd still connected, run fio on the single drive - which is still hooked up to the original backplane and to the same AOC-S3008L-8LE HBA. Results:
--
mixed random read: 28200 iops 116MB/s
mixed random write: 28200 iops 115MB/s
--
This is the best single drive result I've got from these ssd's. Fast fio tests now. No slowdowns, no several minutes of zero io, nothing. Just ticks along like it should. I've since tested all ssd's one at a time in centos7 on this same server hardware and all give similar performance. Now I am sure it isn't the ssds, isn't the hba... what could it be?
No smart errors. Found a post by jgreco with a solnet drive test that I ran when I had Truenas installed. Showed all my SSDs only able to push out 61MB/s. I'm not sure what is causing them to be crippled in performance.
Where do I investigate next?
Thanks!
Long post, apologies up front. Trying to be as information rich so some good feedback can be provided.
New to Truenas and while I have used ZFS before (back in openindiana days) I am still pretty much a complete newb. Or, at least I know enough to be dangerous, but at least I recognize my limitations.
The backstory to using Freenas/Truenas that I have many (several hundred) of VM's to backup - basic tar & rsync stuff. To handle this I have 8 servers running vanilla centos with 8 sata drives on lsi raid cards that the backups are pushed to overnight.
The IO of the backups are killing the backup servers. CPU's are fine, it's the raid that's not able to keep up. We have tons of free space on the backup servers.
So, I thought (bad move - wife says I should never think) I could maybe re-purpose some spare SSD's, plus a big but slightly older dual e5 2690 v3 server with 384gb ram into a zfs box. ISCSI/NFS mount via 10gbps network to the existing backup servers and maybe we don't have to buy several more backup servers. I've been eyeing Freenas/Truenas for a while - what a great opportunity to play.
I build a beast. Or so I thought. 22 x 1.6tb cloudspeed ssd's, 2 x 32gb satadoms to install on, dual e5 2690 v3, 384gb ram, supermicro X10DRU-i+ motherboard, 24 port chassis, AOC-S3008L-8LE HBA in IT mode in x8 slot. Has 4 x 10gbaseT ports too, but I plan to go to 10g via SPF+ soon as parts arrive.
Memtest the beast for two days, no errors. Install Truenas 11.3-U4.1.
Create pool of 7 raidz1 vdev's with 1 hot spare. So, in my mind this is essentially a big raid 50, should offer some decent performance (to my limited zfs understanding, roughly the speed of 7 ssd when writing, potentially more when reading, plus whatever the arc cache from all that ram leads to). Feel free to tell me the logic flaw here, but I don't "think" the pool config is related to the issue I am facing.
Then I test it. In my test area, I only have 3 test servers (e3 1245 v3 xeons) and a gige network.
Run fio, with all 3 servers hooked up via iscsi. Each server is connecting via a different network interface on the Truenas server (used 3 of the 4 onboard 10gbaseT ports to get 3 x 1gbps ethernet). For a few seconds it looked like a success! But then, performance was all over the place. Completely dies out at times - to zero traffic, zero load, zero everything for 30 seconds... then suddenly traffic comes back and we have 3 servers pushing x 90+MB/s or basically saturating the gige ports. Thought perhaps it was the cheap netgear switch, threw in a dell switch and the exact same issue occurred. Tried some network tuning, few other little things, etc.
I had a bright idea, let's try to benchmark the pool on the Truenas server directly to take some variables out of the equation. After a little research, I do and the performance is bad, no matter what I run. But the worst is this fio test:
--
fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=testing --filename=randreadwrite.fio --bs=4k --iodepth=64 --size=10G --rw=randrw
--
Now, maybe the direct=1 and gtod_reduce=1 aren't really needed. I don't know. But with or without those flags, there was very little variation in my testing.
Run on the pool (/mnt/pool) I get the following:
--
mixed random read: 14800 iops 60.7MB/s
mixed random write: 14800 iops 60.6MB/s
--
But the biggest issue is that during the test, at times it would just... pause. Or display a really low throughput. Like so:
--
Jobs: 1 (f=1): [m(1)][19.7%][r=644KiB/s,w=592KiB/s][r=161,w=148 IOPS][eta 14m:03s]
--
I was running gstat as well and the drives were always at 85-105% busy even during the period of low IOPS. If the performance was consistent maybe I wouldn't be pulling my hair out. Maybe some cache was being hit and it was flushing to disk? I don't know. Seems off. Fio tests took 15 minutes to complete.
Thought perhaps it's the pool configuration, maybe a stripe of 7 raidz1 vdev's each having 3 ssd is not a good idea. Removed the hot spare drive from the one big pool I had created and created a pool of 1 drive with it and tested that pool with the same fio as above.
The results were not pleasing:
--
mixed random read: 1057 iops 4331kB/s
mixed random write: 1055 iops 4325B/s
--
Exact same issue with the inconsistent performance here too. Several minutes of no progress on the fio test and then suddenly things start moving again.
At this point I am thinking it has to be the HBA. I pull the hot spare. Plug into an e3 server, straight to the onboard sata controller. Install centos7. Run the same fio test just different engine (--ioengine=libaio vs --ioengine=posixaio because linux vs freebsd). Results are much more pleasing:
--
mixed random read: 22900 iops 93.9MB/s
mixed random write: 22900 iops 93.8MB/s
--
A single drive in an e3 xeon running centos 7 outperforms the entire Truenas zfs pool of 21 drives by a significant margin. Wow. Something feels wrong.
Pulled all the ssd drives from the Truenas box except 1. Install centos7 on the remaining ssd still connected, run fio on the single drive - which is still hooked up to the original backplane and to the same AOC-S3008L-8LE HBA. Results:
--
mixed random read: 28200 iops 116MB/s
mixed random write: 28200 iops 115MB/s
--
This is the best single drive result I've got from these ssd's. Fast fio tests now. No slowdowns, no several minutes of zero io, nothing. Just ticks along like it should. I've since tested all ssd's one at a time in centos7 on this same server hardware and all give similar performance. Now I am sure it isn't the ssds, isn't the hba... what could it be?
No smart errors. Found a post by jgreco with a solnet drive test that I ran when I had Truenas installed. Showed all my SSDs only able to push out 61MB/s. I'm not sure what is causing them to be crippled in performance.
Where do I investigate next?
Thanks!