How to troubleshoot slow SSD performance when running Truenas?

basement_tech · Jul 29, 2020

Hi All.

Long post, apologies up front. Trying to be as information rich so some good feedback can be provided.

New to Truenas and while I have used ZFS before (back in openindiana days) I am still pretty much a complete newb. Or, at least I know enough to be dangerous, but at least I recognize my limitations.

The backstory to using Freenas/Truenas that I have many (several hundred) of VM's to backup - basic tar & rsync stuff. To handle this I have 8 servers running vanilla centos with 8 sata drives on lsi raid cards that the backups are pushed to overnight.

The IO of the backups are killing the backup servers. CPU's are fine, it's the raid that's not able to keep up. We have tons of free space on the backup servers.

So, I thought (bad move - wife says I should never think) I could maybe re-purpose some spare SSD's, plus a big but slightly older dual e5 2690 v3 server with 384gb ram into a zfs box. ISCSI/NFS mount via 10gbps network to the existing backup servers and maybe we don't have to buy several more backup servers. I've been eyeing Freenas/Truenas for a while - what a great opportunity to play.

I build a beast. Or so I thought. 22 x 1.6tb cloudspeed ssd's, 2 x 32gb satadoms to install on, dual e5 2690 v3, 384gb ram, supermicro X10DRU-i+ motherboard, 24 port chassis, AOC-S3008L-8LE HBA in IT mode in x8 slot. Has 4 x 10gbaseT ports too, but I plan to go to 10g via SPF+ soon as parts arrive.

Memtest the beast for two days, no errors. Install Truenas 11.3-U4.1.

Create pool of 7 raidz1 vdev's with 1 hot spare. So, in my mind this is essentially a big raid 50, should offer some decent performance (to my limited zfs understanding, roughly the speed of 7 ssd when writing, potentially more when reading, plus whatever the arc cache from all that ram leads to). Feel free to tell me the logic flaw here, but I don't "think" the pool config is related to the issue I am facing.

Then I test it. In my test area, I only have 3 test servers (e3 1245 v3 xeons) and a gige network.

Run fio, with all 3 servers hooked up via iscsi. Each server is connecting via a different network interface on the Truenas server (used 3 of the 4 onboard 10gbaseT ports to get 3 x 1gbps ethernet). For a few seconds it looked like a success! But then, performance was all over the place. Completely dies out at times - to zero traffic, zero load, zero everything for 30 seconds... then suddenly traffic comes back and we have 3 servers pushing x 90+MB/s or basically saturating the gige ports. Thought perhaps it was the cheap netgear switch, threw in a dell switch and the exact same issue occurred. Tried some network tuning, few other little things, etc.

I had a bright idea, let's try to benchmark the pool on the Truenas server directly to take some variables out of the equation. After a little research, I do and the performance is bad, no matter what I run. But the worst is this fio test:

--
fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=testing --filename=randreadwrite.fio --bs=4k --iodepth=64 --size=10G --rw=randrw
--

Now, maybe the direct=1 and gtod_reduce=1 aren't really needed. I don't know. But with or without those flags, there was very little variation in my testing.

Run on the pool (/mnt/pool) I get the following:

--
mixed random read: 14800 iops 60.7MB/s
mixed random write: 14800 iops 60.6MB/s
--

But the biggest issue is that during the test, at times it would just... pause. Or display a really low throughput. Like so:

--
Jobs: 1 (f=1): [m(1)][19.7%][r=644KiB/s,w=592KiB/s][r=161,w=148 IOPS][eta 14m:03s]
--

I was running gstat as well and the drives were always at 85-105% busy even during the period of low IOPS. If the performance was consistent maybe I wouldn't be pulling my hair out. Maybe some cache was being hit and it was flushing to disk? I don't know. Seems off. Fio tests took 15 minutes to complete.

Thought perhaps it's the pool configuration, maybe a stripe of 7 raidz1 vdev's each having 3 ssd is not a good idea. Removed the hot spare drive from the one big pool I had created and created a pool of 1 drive with it and tested that pool with the same fio as above.

The results were not pleasing:

--
mixed random read: 1057 iops 4331kB/s
mixed random write: 1055 iops 4325B/s
--

Exact same issue with the inconsistent performance here too. Several minutes of no progress on the fio test and then suddenly things start moving again.

At this point I am thinking it has to be the HBA. I pull the hot spare. Plug into an e3 server, straight to the onboard sata controller. Install centos7. Run the same fio test just different engine (--ioengine=libaio vs --ioengine=posixaio because linux vs freebsd). Results are much more pleasing:

--
mixed random read: 22900 iops 93.9MB/s
mixed random write: 22900 iops 93.8MB/s
--

A single drive in an e3 xeon running centos 7 outperforms the entire Truenas zfs pool of 21 drives by a significant margin. Wow. Something feels wrong.

Pulled all the ssd drives from the Truenas box except 1. Install centos7 on the remaining ssd still connected, run fio on the single drive - which is still hooked up to the original backplane and to the same AOC-S3008L-8LE HBA. Results:

--
mixed random read: 28200 iops 116MB/s
mixed random write: 28200 iops 115MB/s
--

This is the best single drive result I've got from these ssd's. Fast fio tests now. No slowdowns, no several minutes of zero io, nothing. Just ticks along like it should. I've since tested all ssd's one at a time in centos7 on this same server hardware and all give similar performance. Now I am sure it isn't the ssds, isn't the hba... what could it be?

No smart errors. Found a post by jgreco with a solnet drive test that I ran when I had Truenas installed. Showed all my SSDs only able to push out 61MB/s. I'm not sure what is causing them to be crippled in performance.

Where do I investigate next?

Thanks!

sretalla · Jul 30, 2020

basement_tech said:
Install Truenas 11.3-U4.1.

First confusion... have you installed TrueNAS on non-TrueNAS hardware before Core (12) has arrived? This isn't normal. Did you mean Install FreeNAS 11.3-U4.1 ?

sretalla · Jul 30, 2020

Is this the product you got?

basement_tech said:
22 x 1.6tb cloudspeed ssd's

Sandisk CloudSpeed Ultra GenII SSD 1.6TB Data Center 6.4cm 2.5inch SATA 6Gb/s 15nm MLC mixed-use (1600GB, 2.5")
The site I'm looking at shows that these are:

Read transfer speed	66.25 Mb/s
Write transfer speed	57.50 Mb/s

Which isn't far from the figures you got when testing in FreeNAS (or TrueNAS)... but really slow for SSDs (slower than many HDDs). I don't see IOPS figures on that site, but after some searching... Read 76K and Write 32K, so for your 7 VDEVS, you're well over 200K IOPS. That same site (CNET) shows 530 MBps (read) / 460 MBps (write), so I'm not sure where the other site got their data.

What you could expect when using iSCSI... Sync writes... is that your IOPS will be the number of IOPS per VDEV x number of VDEVs (for a RAIDZ VDEV, that's the IOPS of one SSD), so you should get the IOPS of 7 SSDs for writing (at best), but it seems you need to work on your individual drive speed truth before worrying about that.

It's bizarre that Unix and Linux would be so different... the only obvious difference I know about is that FreeBSD doesn't support TRIM properly (maybe that's a thing of the past in 12) and I have been aware of some issues with that on other SSDs, so perhaps there's something to check... disable TRIM?

basement_tech · Jul 30, 2020

sretalla said:
First confusion... have you installed TrueNAS on non-TrueNAS hardware before Core (12) has arrived? This isn't normal. Did you mean Install FreeNAS 11.3-U4.1 ?

Yeah, sorry - confusion on my part because they've changed their name haven't they? I figured the branding on the freenas interface just hasn't been updated.

I just downloaded the free version here: https://www.freenas.org/download-freenas-release/

And while these SSD are not the highest performing around (they are old drives I pulled from another project), they should be producing better pool performance I would think.

For me it is the comparison between Centos7 and Freenas on the same hardware that has me concerned. I wish I could find something broken to replace!

I would be happy if my overall pool performance was somewhere around 100k iops. Which I estimated that this would be about half of what I should expect and that is how I usually plan for capacity.

Thanks for the reply, I'll see if I can figure out if it's possible to disable trim on freenas.

HoneyBadger · Jul 30, 2020

basement_tech said:
Thanks for the reply, I'll see if I can figure out if it's possible to disable trim on freenas.

Tunable is vfs.zfs.trim.enabled=0 for this. Although I'm wondering if perhaps TRIM isn't being passed properly by the HBA.

Is the drive model SDLF1CRM-016T? That should be capable of way more throughput here if that's the case.

Can you do a camcontrol identify daX on one of the drives?

basement_tech · Jul 30, 2020

HoneyBadger said:
Tunable is vfs.zfs.trim.enabled=0 for this. Although I'm wondering if perhaps TRIM isn't being passed properly by the HBA.

Is the drive model SDLF1CRM-016T? That should be capable of way more throughput here if that's the case.

Can you do a camcontrol identify daX on one of the drives?

I'm taking a brain break as the server is located under my desk in my tiny basement office. The fans are insanely loud. Thanks to covid, I need to apply to access the DC so... do what I must.

I have a pile of the drives on my desk though, they are sdlf1crm-017t-1hst. Which, I just googled and found very little info. Odd - drives were from a project where the client supplied hardware.

On centos7, sequential read and write fio tests were both over 50,000 iops and 220+MB/s. So while the drives aren't the greatest performers around, they are ok. Unlike when run on Freenas, the fio tests didn't stutter/pause or suddenly drop to low IOPS and then go back up.

Appreciate you passing along the tunable! Saved me plenty of work there. :)

At the end of the day, I'd be ok with a mild performance penalty using Freenas and zfs, because it comes with so many other advantages. But I am seeing performance that is 20x worse when running freenas vs centos7 on the same hardware. Of course, once I connect via iscsi, I then get hit with the network performance penalty due to latency and the protocol overhead, etc.

I'm going to investigate block size/sector size/alignment as well as trim today in my free time. Maybe I will actually install freebsd itself and test with that as well. I have two other SSD on my desk, I am going to bench both of those in centos7 and Freenas today as well - a seagate 600pro (st200fp0021 - we had started to use these drives but they suddenly started failing so we had to replace them all, I wouldn't recommend anyone use this drive for anything other than testing) and an intel DC s3520 (ssdsc2bb480g7 just some old thing lying around). Unfortunately the rest of our drives are in the DC, they are all Samsung PM883's I believe.

I'd really appreciate any other suggestions to pursue!

HoneyBadger · Jul 30, 2020

That part number looks like it's a OEM variant on the CloudSpeed Ultra II - probably a bulk eBay order from a decommissioned SAN. Found some Windows-based results here:

https://imgur.com/a/3XpGD7C

I would suspect there's something amiss where perhaps caching isn't being enabled correctly under FreeBSD, or there's a bug with the mpr driver in FreeNAS that's slowing things down. Hence the request to pull a camcontrol on them to see if the lines there indicate the r/w cache being off when it's a pool member. If it is, that's a problem.

Try with the Intel DC S3520. Those are nice MLC capacity-tier drives. The PM883s would be good as well, but as you said they're stuck in the datacenter.

HoneyBadger · Jul 30, 2020

Also, as you're using a SAS3008-based HBA, check into this resource to see if it applies to your situation. iX has a specific firmware from Broadcom but it's related to HBA resets. Don't just blind-flash it, but this might be relevant. Check sas3flash -list to see your current firmware.

LSI 9300-xx Firmware Update

Hey Community, If you are using an LSI 9300 HBA with FreeNAS or the soon-to-be TrueNAS CORE, you may experience some performance issues causing the controller to reset when using SATA HDDs. After working with Broadcom, we’ve come up with a...

www.ixsystems.com

basement_tech · Jul 30, 2020

HoneyBadger said:
That part number looks like it's a OEM variant on the CloudSpeed Ultra II - probably a bulk eBay order from a decommissioned SAN. Found some Windows-based results here:

https://imgur.com/a/3XpGD7C

I would suspect there's something amiss where perhaps caching isn't being enabled correctly under FreeBSD, or there's a bug with the mpr driver in FreeNAS that's slowing things down. Hence the request to pull a camcontrol on them to see if the lines there indicate the r/w cache being off when it's a pool member. If it is, that's a problem.

Try with the Intel DC S3520. Those are nice MLC capacity-tier drives. The PM883s would be good as well, but as you said they're stuck in the datacenter.

Here we go:

--
pass10: <SDLF1CRM-017T-1HST ZR03RPA1> ATA8-ACS SATA 3.x device
pass10: 1200.000MB/s transfers, Command Queueing Enabled

protocol ATA8-ACS SATA 3.x
device model SDLF1CRM-017T-1HST
firmware revision ZR03RPA1
<removed serial>
additional product id
cylinders 16383
heads 16
sectors/track 63
sector size logical 512, physical 4096, offset 0
LBA supported 268435455 sectors
LBA48 supported 3320977968 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6
media RPM non-rotating
Zoned-Device Commands no

Feature Support Enabled Value Vendor
read ahead yes yes
write cache yes yes
flush cache yes yes
Native Command Queuing (NCQ) yes 32 tags
NCQ Priority Information yes
NCQ Non-Data Command yes
NCQ Streaming no
Receive & Send FPDMA Queued yes
NCQ Autosense yes
SMART yes yes
security yes no
power management yes yes
microcode download yes yes
advanced power management no no
automatic acoustic management no no
media status notification no no
power-up in Standby no no
write-read-verify no no
unload no no
general purpose logging yes yes
free-fall no no
sense data reporting yes no
extended power conditions no no
device statistics notification no no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks yes 1
DSM - deterministic read yes zeroed
Trusted Computing no
encrypts all user data yes
Sanitize yes block, overwrite, crypto
Sanitize - commands allowed no
Sanitize - antifreeze lock yes
Host Protected Area (HPA) yes no 3320977968/3320977968
HPA - Security no
Accessible Max Address Config no
--

VS intel drive:

--
pass28: <INTEL SSDSC2BB480G7 N2010101> ACS-3 ATA SATA 3.x device
pass28: 1200.000MB/s transfers, Command Queueing Enabled

protocol ACS-3 ATA SATA 3.x
device model INTEL SSDSC2BB480G7
firmware revision N2010101
<removed serial>
additional product id
cylinders 16383
heads 16
sectors/track 63
sector size logical 512, physical 4096, offset 0
LBA supported 268435455 sectors
LBA48 supported 937703088 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6
media RPM non-rotating
Zoned-Device Commands no

Feature Support Enabled Value Vendor
read ahead yes yes
write cache yes yes
flush cache yes yes
Native Command Queuing (NCQ) yes 32 tags
NCQ Priority Information no
NCQ Non-Data Command no
NCQ Streaming no
Receive & Send FPDMA Queued no
NCQ Autosense no
SMART yes yes
security yes no
power management yes yes
microcode download yes yes
advanced power management no no
automatic acoustic management no no
media status notification no no
power-up in Standby no no
write-read-verify no no
unload yes yes
general purpose logging yes yes
free-fall no no
sense data reporting yes no
extended power conditions no no
device statistics notification no no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks yes 4
DSM - deterministic read yes zeroed
Trusted Computing no
encrypts all user data yes
Sanitize yes block, overwrite, crypto
Sanitize - commands allowed yes
Sanitize - antifreeze lock yes
Host Protected Area (HPA) no
Accessible Max Address Config yes no 937703088/937703088
--

Looks like caches enabled.

Intel SSD fio run (same command line as before):

--
mixed random read: 3514 iops 14.4MB/s
mixed random write: 3509 iops 14.4MB/s
--

Ok, so it is poor with the intel too.

So let's check that HBA firmware:

--
# sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

Adapter Selected is a Avago SAS: SAS3008(C0)

Controller Number : 0
Controller : SAS3008(C0)
PCI Address : 00:03:00:00
SAS Address : 5003048-0-196e-e801
NVDATA Version (Default) : 07.01.30.24
NVDATA Version (Persistent) : 07.01.30.24
Firmware Product ID : 0x2221 (IT)
Firmware Version : 08.00.00.00
NVDATA Vendor : LSI
NVDATA Product ID : LSI3008-IT
BIOS Version : 08.17.00.00
UEFI BSD Version : 09.00.00.00
FCODE Version : N/A
Board Name : LSI3008-IT
Board Assembly : N/A
Board Tracer Number : N/A

Finished Processing Commands Successfully.
Exiting SAS3Flash.
--

HoneyBadger · Jul 30, 2020

basement_tech said:
Firmware Version : 08.00.00.00

If that's accurate, that's a pretty old firmware. I wouldn't necessarily say to move to the one in the iX resource, but definitely get to an update - the latest on the Supermicro site is 16.00.10.00

https://www.supermicro.com/wftp/driver/SAS/LSI/3008/Firmware/3008_FW_PH16.00.10.00.zip

And there's also the generic Broadcom firmware which will probably be fine as well.

Host Bus Adapters

Broadcom Host Bus Adapter (HBA) cards can enable an easy, long-term storage growth strategy in practically any direct-attached storage scenario.

www.broadcom.com

basement_tech · Jul 30, 2020

HoneyBadger said:
If that's accurate, that's a pretty old firmware. I wouldn't necessarily say to move to the one in the iX resource, but definitely get to an update - the latest on the Supermicro site is 16.00.10.00

https://www.supermicro.com/wftp/driver/SAS/LSI/3008/Firmware/3008_FW_PH16.00.10.00.zip

And there's also the generic Broadcom firmware which will probably be fine as well.

Host Bus Adapters

Broadcom Host Bus Adapter (HBA) cards can enable an easy, long-term storage growth strategy in practically any direct-attached storage scenario.

www.broadcom.com

Done:

--
# sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

Adapter Selected is a Avago SAS: SAS3008(C0)

Controller Number : 0
Controller : SAS3008(C0)
PCI Address : 00:03:00:00
SAS Address : 5003048-0-196e-e801
NVDATA Version (Default) : 0e.01.30.28
NVDATA Version (Persistent) : 0e.01.30.28
Firmware Product ID : 0x2221 (IT)
Firmware Version : 16.00.10.00
NVDATA Vendor : LSI
NVDATA Product ID : LSI3008-IT
BIOS Version : 08.37.00.00
UEFI BSD Version : 18.00.00.00
FCODE Version : N/A
Board Name : LSI3008-IT
Board Assembly : N/A
Board Tracer Number : N/A

Finished Processing Commands Successfully.
Exiting SAS3Flash.
--

Sadly. No difference, if anything slower? Intel drive fio result is actually mildly worse:

--
mixed random read: 3485 iops 14.3MB/s
mixed random write: 3481 iops 14.3MB/s
--

Ahh but the pool itself does bench better:

--
mixed random read: 22700 iops 93.1MB/s
mixed random write: 22700 iops 93.0MB/s
--

Still, it was "stuttering" by dropping to a few hundred IOPS and then spiking back up again and it is a fifth of what I was hoping for.

But, that is an improvement. I would possibly even be ok with this is it didn't "stutter" by going to zero iops and then shooting back up every few seconds. I could live with consistent performance.

Tried setting the tunable for trim:

--
vfs.zfs.trim.enabled=0
--

Re-ran the fio bench on the pool a few times, averaged to:

--
mixed random read: 20200 iops 82.9MB/s
mixed random write: 20200 iops 82.8MB/s
--

Not a HUGE difference, but consistent and worse, so I have removed the tunable.

I don't know where to go from here. May try a freebsd install just to see how things are on that. But, as it all runs much better on centos7 at this point I am fairly confident there isn't a hardware error. Maybe just a slight incompatibility or some tuning I am unaware of.

Little frustrating though.

basement_tech · Aug 1, 2020

I had to work yesterday and couldn't test this at all.

Today however, I've managed to install the same server with centos7, create a ridiculous raid 50 of 7 x raid 5 arrays with 3 members each to try and mimic the ZFS pool I had created in Freenas. I ran fio again, with the same command as the first post.

In Freenas I was seeing performance like:

--
mixed random read: 14800 iops 60.7MB/s
mixed random write: 14800 iops 60.6MB/s
--

In cento7, fio returned:

--
mixed random read: 37900 iops 155MB/s
mixed random write: 37900 iops 155MB/s
--

For 4kb random mixed workload performance, I'm happy with that. Had Freenas given me that level of performance, this thread would never had been started. :)

Even better, in centos 7 the performance was rock solid, no jumping up and down or stalling out completely like Freenas.

So onto freebsd to try and figure out what is causing this crazy performance issue. I'm supposed to be doing yardwork today - the wife isn't going to be happy with me :)

basement_tech · Aug 2, 2020

So, I was caught playing instead of working and had to go get some yardwork done. Can't complain though, the wife knows the secret to my heart - she ran to the beer shop and grabbed me a treat.

Anyways.

I got freebsd installed this evening, setup my "raid 50" style zfs pool and ran a quick fio:

--
mixed random read: 30300 iops 124MB/s
mixed random write: 30200 iops 124MB/s
--

No stuttering or dips in performance observed during the test either.

So. Now I am wondering, wth? These are the exact same tests, exact same hardware as the Truenas/Freenas install I had?

I thought Truenas/Freenas was just freebsd with an awesome GUI? Shouldn't it basically be giving the same performance?

Alex_K · Aug 29, 2022

I tried comparing with CentOS 7.9 and linear read speed there were 10-20+% higher then in TrueNAS (up to the brim as in drives datasheet), and I haven't experienced no read errors. But! CPU usage with 8 SATA-3 SSD reading to /dev/null in parallel were ~25% (excluding waits), where same load in TrueNAS is giving same system 3% CPU load.

What was your experience with CPU usage in CentOS compared to TrueNAS?

Important Announcement for the TrueNAS Community.

How to troubleshoot slow SSD performance when running Truenas?

basement_tech

Cadet

sretalla

Powered by Neutrality

sretalla

Powered by Neutrality

basement_tech

Cadet

HoneyBadger

actually does care

basement_tech

Cadet

HoneyBadger

actually does care

HoneyBadger

actually does care

LSI 9300-xx Firmware Update

basement_tech

Cadet

HoneyBadger

actually does care

Host Bus Adapters

basement_tech

Cadet

Host Bus Adapters

basement_tech

Cadet

basement_tech

Cadet

Alex_K

Explorer

Similar threads