Poor write performance since upgrade to 9.3

Status
Not open for further replies.
Joined
Dec 28, 2014
Messages
5
I have been experiencing abysmal write performance since upgrading to 9.3. There have been no other changes to my box for at least a year now.

I have a Xeon E3-1245V2 with 32GB RAM. I have 2 LSI 1068E-based adapters running a mixture of twelve 2TB drives (mostly WD red/green with intellipark off) in RAIDZ (4 Vdevs of 3) with a 500GB L2ARC and 8GB SLOG on separate SSDs. I am running FreeNAS-9.3-STABLE-201412240734.

I really don't know what is going on. Ever since upgrading to 9.3, moving files around my box has become very slow. For instance, moving twenty-two 2.28GB files (~48GB) from one filesystem to another took over 8 minutes (using mv). When watching the output of zpool iostat, it appears that my writes seem to slow down to about 18-20MB/s for a period of time and then shoot back up to what the normally were (typically ~150-350MB/s) for a bit, then back. Here is an example:

zpool iostat storage 1

storage 15.3T 6.48T 0 272 0 18.1M
storage 15.3T 6.48T 0 301 0 20.0M
storage 15.3T 6.48T 0 271 0 18.0M
storage 15.3T 6.48T 0 299 0 19.9M
storage 15.3T 6.48T 0 256 0 17.0M
storage 15.3T 6.48T 0 297 0 19.7M
storage 15.3T 6.48T 0 298 0 19.8M
storage 15.3T 6.48T 0 280 0 18.6M
storage 15.3T 6.48T 0 287 0 19.1M
storage 15.3T 6.48T 0 286 0 19.0M
storage 15.3T 6.48T 196 282 24.6M 18.7M
storage 15.3T 6.48T 0 281 0 18.7M
storage 15.3T 6.48T 0 286 0 19.0M
storage 15.3T 6.48T 0 303 0 20.2M
storage 15.3T 6.48T 0 803 0 83.2M
storage 15.3T 6.48T 612 2.28K 76.5M 259M
storage 15.3T 6.48T 991 1.25K 124M 106M
storage 15.3T 6.48T 364 2.05K 45.5M 242M
storage 15.3T 6.48T 101 3.20K 12.7M 407M
storage 15.3T 6.48T 215 2.96K 27.0M 379M
storage 15.3T 6.48T 663 2.89K 82.9M 370M
storage 15.3T 6.48T 648 2.12K 81.0M 272M
storage 15.3T 6.48T 690 2.88K 86.3M 368M
storage 15.3T 6.48T 1.02K 1.78K 131M 228M
storage 15.3T 6.48T 1.49K 740 191M 91.7M
storage 15.3T 6.48T 509 383 63.7M 48.0M
storage 15.3T 6.48T 690 1.09K 86.3M 98.4M
storage 15.3T 6.48T 595 2.38K 74.4M 304M
storage 15.3T 6.48T 1.02K 2.36K 131M 296M
storage 15.3T 6.48T 508 2.38K 63.4M 291M
storage 15.3T 6.48T 260 664 32.6M 81.3M
storage 15.3T 6.48T 1.24K 760 159M 52.1M
storage 15.3T 6.48T 1.61K 1.56K 206M 199M
storage 15.3T 6.48T 1.98K 1.00K 254M 127M
storage 15.3T 6.48T 2.49K 126 318M 14.4M
storage 15.3T 6.48T 1001 132 125M 759K
storage 15.3T 6.48T 1.40K 2.59K 179M 308M
storage 15.3T 6.48T 1.02K 2.14K 130M 274M
storage 15.3T 6.48T 1.42K 2.06K 182M 263M
storage 15.3T 6.48T 1.51K 1.23K 194M 157M
storage 15.3T 6.48T 2.50K 371 320M 30.0M
storage 15.3T 6.48T 889 238 111M 14.5M
storage 15.3T 6.48T 921 2.95K 115M 365M
storage 15.3T 6.48T 529 3.12K 66.2M 400M
storage 15.3T 6.48T 705 2.18K 88.2M 268M
storage 15.3T 6.48T 2.08K 1.02K 266M 93.5M
storage 15.3T 6.48T 1.88K 706 240M 39.2M
storage 15.3T 6.48T 2.91K 579 372M 38.5M
storage 15.3T 6.48T 2.43K 579 311M 38.5M
storage 15.3T 6.48T 2.25K 545 288M 36.2M
storage 15.3T 6.48T 1.55K 288 199M 19.2M
storage 15.3T 6.48T 1.00K 298 128M 19.8M
storage 15.3T 6.48T 1017 280 127M 18.6M
storage 15.3T 6.48T 787 289 98.4M 19.2M
storage 15.3T 6.48T 1005 290 126M 19.3M
storage 15.3T 6.48T 1021 290 128M 19.3M
storage 15.3T 6.48T 1021 290 128M 19.3M
storage 15.3T 6.48T 582 294 72.9M 19.6M
storage 15.3T 6.48T 909 291 114M 19.4M
storage 15.3T 6.48T 509 294 63.7M 19.6M
storage 15.3T 6.48T 408 282 51.0M 18.8M
storage 15.3T 6.48T 252 292 31.6M 19.4M
storage 15.3T 6.48T 335 291 42.0M 19.4M
storage 15.3T 6.48T 29 268 3.73M 17.8M
storage 15.3T 6.48T 204 283 25.6M 18.8M
storage 15.3T 6.48T 35 296 4.40M 19.7M
storage 15.3T 6.48T 169 294 21.2M 19.6M
storage 15.3T 6.48T 7 282 988K 18.8M

Any ideas what might be going on? I also have these following alerts since upgrading:
  • WARNING: Firmware version 15 does not match driver version 16 for /dev/mps0
  • WARNING: Firmware version 18 does not match driver version 16 for /dev/mps1
  • WARNING: Firmware version 18 does not match driver version 16 for /dev/mps2
Thanks,
Eric
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
I read some posts that say you should be using the Firmware that matches the P16 driver.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I read some posts that say you should be using the Firmware that matches the P16 driver.
^
This. Those warnings are pretty self-explanatory: the correct firmware version for LSI SAS2 controllers is P16. P15 is bad. P18 hasn't been shown not to work, but LSI only supports matched drivers and firmware. P19 is bad and P20 is catastrophic.

But why do you have three? You said you're only using two.
 
Joined
Dec 28, 2014
Messages
5
Thanks for the reply.
^
This. Those warnings are pretty self-explanatory: the correct firmware version for LSI SAS2 controllers is P16. P15 is bad. P18 hasn't been shown not to work, but LSI only supports matched drivers and firmware. P19 is bad and P20 is catastrophic.

But why do you have three? You said you're only using two.

I think believe there's an onboard LSI controller on the current MB that is not being used. I am certain that both adapters I use were flashed to IT firmware right after I installed them. I am not familiar with any of this P15, P16, etc stuff; only that it was IT firmware, so I will have to look into it. It has been quite some time since I put this together. I will have to do some research I guess.

Either way, I am hesitant to believe this is causing my write performance problem. I never had any issues with slow writes in the past. Not under any other version of FreeNAS, not under FreeBSD and not under Solaris/OpenSolaris, all with the same hardware running the same firmware. I have been using one of the adapters for at least 5 years with the same firmware currently on it. No ill effects that I have noticed until this upgrade.
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
make a backup...and change the firmware. so that eric and the other forum elfes can exclude the firmware mismatch as the reasson.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, if cabling isn't an issue, drop one of the HBAs and save yourself some power.

I have to say I'm surprised by the fact that these haven't been updated in five years. I thought Phase 16 firmware was only a couple of years old, much less P17.
This isn't probably the cause of your problem, but it's something that needs fixing.

What often causes inconsistent performance is a bad drive, so be sure to exclude that possibility as well.

Also, how full are these pools?
 
Joined
Dec 28, 2014
Messages
5
Well, if cabling isn't an issue, drop one of the HBAs and save yourself some power.

I have to say I'm surprised by the fact that these haven't been updated in five years. I thought Phase 16 firmware was only a couple of years old, much less P17.
This isn't probably the cause of your problem, but it's something that needs fixing.

What often causes inconsistent performance is a bad drive, so be sure to exclude that possibility as well.

Also, how full are these pools?

Further investigation shows I did replace that 5 year old card. It was a supermicro card for their wierd backwards PCIe bus. I replaced it with an LSI 9211-8i about a year ago when I upgraded the MB/RAM. The other card is an LSI 9207-4i4e. I used the 3 SFF-8087 ports to drive my above mentioned storage pool (12 consumer drives) and the SFF-8088 on the 9207 for my external SAS tape drive. I reckon I must be using the onboard controller as well for the SLOG/L2ARC and root pool. It is an LSI SAS2008 controller, not a 1068E as I originally thought.

I will look into the firmwares and change them if necessary. I obviously don't have that great a memory when it comes to this system (call it a testament to ZFS' set it and forget it nature), but I am really surprised that I apparently didn't flash IT firmware to these chips. I know better than that.

That said, the pool is at 70% capacity, scrubbed yesterday with 0 errors.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You did flash IT, but the versions are the wrong ones. You need P16. The 9211 is an SAS 2008 (perfect), the 9207 an SAS 2308 (updated 2008, also perfect), no problems there (they're using the mps driver, so no problems would be expected. I did find the model number oddly SAS1-ish, but that's sorted out now).

Now, depending on your workload, 70% full may be beyond the point where you start noticing performance degradation (80% is the general "do not fill beyond this line" limit, 95% is "you're screwed."). A relatively full old pool with fragmentation could probably account for the slowish behavior.
 
Joined
Dec 28, 2014
Messages
5
You did flash IT, but the versions are the wrong ones. You need P16. The 9211 is an SAS 2008 (perfect), the 9207 an SAS 2308 (updated 2008, also perfect), no problems there (they're using the mps driver, so no problems would be expected. I did find the model number oddly SAS1-ish, but that's sorted out now).

Now, depending on your workload, 70% full may be beyond the point where you start noticing performance degradation (80% is the general "do not fill beyond this line" limit, 95% is "you're screwed."). A relatively full old pool with fragmentation could probably account for the slowish behavior.

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
storage 21.8T 15.3T 6.40T - 3% 70% 1.00x ONLINE /mnt

I could live with some degradation, but 20MB/s natively ain't gonna cut it. Something else is going on, I just don't know what it is or how to find it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
I've arrived to this party late, but I can tell you that having the wrong firmware version *can* result in horribad performance. The problem is that the driver talks to the controller a certain way and the controller starts having timeouts and such with the commands, so you end up with horribly bad latency from the system to the controller because the controller is unhappy. I'd definitely reflash the controllers as a first step. Can't really rule out other problems until you've ruled out the storage subsystem. ;)
 
Joined
Dec 28, 2014
Messages
5
Well, I pulled the 9207 and flashed the two SAS2008s. Everything is running off of these two controllers now.

[eric@nas] /usr/local/www/freenasUI/tools# sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 14.00.00.00 (2012.07.04)
Copyright (c) 2008-2012 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2008(B2)

Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------

0 SAS2008(B2) 16.00.00.00 10.00.00.06 07.31.00.00 00:02:00:00
1 SAS2008(B2) 16.00.00.00 10.00.00.06 07.31.00.00 00:03:00:00

Still having the same problems with writes bogging down. I have tried watching zpool iostat -v output to try and isolate the behavior to a particular drive or even vdev but it appears to be completely random. I'll spare you THAT spam, but this sums it up:

[eric@nas] /nonexistent# zpool iostat storage 1
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
storage 15.4T 6.33T 651 1.15K 80.7M 133M
storage 15.4T 6.33T 515 1.60K 64.0M 183M
storage 15.4T 6.33T 351 2.39K 43.7M 294M
storage 15.4T 6.33T 99 2.92K 12.5M 363M
storage 15.4T 6.33T 233 3.08K 28.9M 391M
storage 15.4T 6.33T 159 3.81K 19.9M 484M
storage 15.4T 6.33T 46 3.95K 5.75M 501M
storage 15.4T 6.33T 360 1.56K 44.6M 195M
storage 15.4T 6.33T 1.21K 337 154M 24.0M
storage 15.4T 6.33T 1.11K 2.36K 141M 298M
storage 15.4T 6.33T 1.74K 431 221M 36.7M
storage 15.4T 6.33T 1.26K 1.89K 160M 237M
storage 15.4T 6.33T 1.46K 1.36K 186M 155M
storage 15.4T 6.33T 974 1.46K 121M 182M
storage 15.4T 6.33T 1.62K 1.59K 206M 186M
storage 15.4T 6.33T 983 1.49K 122M 185M
storage 15.4T 6.33T 691 1.58K 85.6M 183M
storage 15.4T 6.32T 716 1.91K 88.8M 221M
storage 15.4T 6.32T 1.58K 875 201M 75.0M
storage 15.4T 6.32T 2.05K 877 261M 97.3M
storage 15.4T 6.32T 1.16K 974 148M 114M
storage 15.4T 6.32T 994 2.28K 123M 271M
storage 15.4T 6.32T 1.24K 1.45K 158M 181M
storage 15.4T 6.32T 1.93K 346 245M 27.6M
storage 15.4T 6.32T 955 2.26K 118M 283M
storage 15.4T 6.32T 1.56K 1.06K 198M 118M
storage 15.4T 6.32T 947 2.07K 117M 259M
storage 15.4T 6.32T 1.27K 992 162M 104M
storage 15.4T 6.32T 1.08K 1.61K 137M 201M
storage 15.4T 6.32T 2.39K 475 303M 58.0M
storage 15.4T 6.32T 2.01K 352 256M 18.6M
storage 15.4T 6.32T 2.03K 580 258M 38.5M
storage 15.4T 6.32T 1.79K 579 227M 38.5M
storage 15.4T 6.32T 2.40K 579 304M 38.5M
storage 15.4T 6.32T 2.58K 579 327M 38.5M
storage 15.4T 6.32T 2.00K 579 254M 38.5M
storage 15.4T 6.32T 773 439 96.0M 29.2M
storage 15.4T 6.32T 1.01K 269 128M 17.9M
storage 15.4T 6.32T 773 294 96.0M 19.6M
storage 15.4T 6.32T 1.01K 303 128M 20.2M
storage 15.4T 6.32T 773 302 96.0M 20.1M
storage 15.4T 6.32T 743 380 92.2M 25.3M
storage 15.4T 6.32T 967 291 120M 19.4M
storage 15.4T 6.32T 773 289 96.0M 19.2M
storage 15.4T 6.32T 773 289 96.0M 19.2M
storage 15.4T 6.32T 515 289 64.0M 19.2M
storage 15.4T 6.32T 257 284 32.0M 18.9M
storage 15.4T 6.32T 265 295 33.0M 19.6M
storage 15.4T 6.32T 249 278 31.0M 18.5M
 
Status
Not open for further replies.
Top