NVMe Controller Reset Under Load (As SLOG Only)

mts1472

Cadet
Joined
Jul 29, 2023
Messages
2
Hello,

Looking for some guidance on an unusual issue.

We've had a TrueNAS 13.0-U5.2 running in production for a few months - we decided that the addition of a SLOG on M.2 would be a good move to improve our write performance.

We installed 2 x M.2 500GB Crucial P5 Plus SSDs in a M.2 PCI Adapter (https://www.startech.com/en-us/hdd/pex8m2e2)
These were configured in a LOG MIRROR with 15GiB Overprovisioning set.

After a few days we noticed periodic 'stalls' in IO. After testing, we found that under moderate write load (two concurrent VM migrations from another NAS), the NFS Share would become inaccessible to Proxmox (with the migrations appearing to continue at a crawing speed). While monitoring zpool iostat, all pool activity does appear to cease. Log entries as below throughout the 'stall'.

Code:
Jul 28 16:33:18 truenas nvme1: Resetting controller due to a timeout.
Jul 28 16:33:18 truenas nvme1: RECOVERY_WAITING
Jul 28 16:33:18 truenas nvme1: resetting controller
Jul 28 16:33:18 truenas nvme1: aborting outstanding i/o
Jul 28 16:34:42 truenas nvme0: RECOVERY_START 61662237802418 vs 61661980108095
Jul 28 16:34:42 truenas nvme0: timeout with nothing complete, resetting
Jul 28 16:34:42 truenas nvme0: Resetting controller due to a timeout.
Jul 28 16:34:42 truenas nvme0: RECOVERY_WAITING
Jul 28 16:34:42 truenas nvme0: resetting controller
Jul 28 16:34:42 truenas nvme0: aborting outstanding i/o
Jul 28 16:35:13 truenas nvme0: RECOVERY_START 61795648062625 vs 61793861364921
Jul 28 16:35:13 truenas nvme0: timeout with nothing complete, resetting
Jul 28 16:35:13 truenas nvme0: Resetting controller due to a timeout.
Jul 28 16:35:13 truenas nvme0: RECOVERY_WAITING
Jul 28 16:35:13 truenas nvme0: resetting controller
Jul 28 16:35:13 truenas nvme0: aborting outstanding i/o
Jul 28 16:36:54 truenas nvme1: RECOVERY_START 62229113312952 vs 62227429693240
Jul 28 16:36:54 truenas nvme1: timeout with nothing complete, resetting
Jul 28 16:36:54 truenas nvme1: Resetting controller due to a timeout.
Jul 28 16:36:54 truenas nvme1: RECOVERY_WAITING
Jul 28 16:36:54 truenas nvme1: resetting controller
Jul 28 17:18:07 truenas nvme1: aborting outstanding i/o
Jul 28 17:18:07 truenas nvme1: RECOVERY_START 72853744986354 vs 72851855207762
Jul 28 17:18:07 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:18:07 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:18:07 truenas nvme1: RECOVERY_WAITING
Jul 28 17:18:07 truenas nvme1: resetting controller
Jul 28 17:18:07 truenas nvme1: aborting outstanding i/o
Jul 28 17:19:16 truenas nvme1: RECOVERY_START 73150535787716 vs 73148701852485
Jul 28 17:19:16 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:19:16 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:19:16 truenas nvme1: RECOVERY_WAITING
Jul 28 17:19:16 truenas nvme1: resetting controller
Jul 28 17:19:16 truenas nvme1: aborting outstanding i/o
Jul 28 17:19:47 truenas nvme1: RECOVERY_START 73282232361163 vs 73282159356784
Jul 28 17:19:47 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:19:47 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:19:47 truenas nvme1: RECOVERY_WAITING
Jul 28 17:19:47 truenas nvme1: resetting controller
Jul 28 17:20:18 truenas nvme1: aborting outstanding i/o
Jul 28 17:20:18 truenas nvme1: RECOVERY_START 73416046354348 vs 73413916058414
Jul 28 17:20:18 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:20:18 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:20:18 truenas nvme1: RECOVERY_WAITING
Jul 28 17:20:18 truenas nvme1: resetting controller
Jul 28 17:20:18 truenas nvme1: aborting outstanding i/o
Jul 28 17:21:31 truenas nvme1: RECOVERY_START 73730463714189 vs 73729282600882
Jul 28 17:21:31 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:21:31 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:21:31 truenas nvme1: RECOVERY_WAITING
Jul 28 17:21:31 truenas nvme1: resetting controller
Jul 28 17:22:29 truenas nvme1: aborting outstanding i/o
Jul 28 17:22:29 truenas nvme1: RECOVERY_START 73978347730675 vs 73976402117990
Jul 28 17:22:29 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:22:29 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:22:29 truenas nvme1: RECOVERY_WAITING
Jul 28 17:22:29 truenas nvme1: resetting controller
Jul 28 17:23:00 truenas nvme1: aborting outstanding i/o
Jul 28 17:23:00 truenas nvme1: RECOVERY_START 74111757999608 vs 74109975590062
Jul 28 17:23:00 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:23:00 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:23:00 truenas nvme1: RECOVERY_WAITING
Jul 28 17:23:00 truenas nvme1: resetting controller
Jul 28 17:23:55 truenas nvme1: aborting outstanding i/o
Jul 28 17:23:55 truenas nvme1: RECOVERY_START 74347886689994 vs 74347830863198
Jul 28 17:23:55 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:23:55 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:23:55 truenas nvme1: RECOVERY_WAITING
Jul 28 17:23:55 truenas nvme1: resetting controller
Jul 28 17:27:34 truenas nvme1: aborting outstanding i/o
Jul 28 17:27:34 truenas nvme1: RECOVERY_START 75288282605573 vs 75286667700742
Jul 28 17:27:34 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:27:34 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:27:34 truenas nvme1: RECOVERY_WAITING
Jul 28 17:27:34 truenas nvme1: resetting controller
Jul 28 17:28:05 truenas nvme1: aborting outstanding i/o
Jul 28 17:28:05 truenas nvme1[4139]: Last message 'aborting outstanding' repeated 1 times, suppressed by syslog-ng on truenas.local
Jul 28 17:28:05 truenas nvme1: RECOVERY_START 75421692862375 vs 75419897576202
Jul 28 17:28:05 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:28:05 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:28:05 truenas nvme1: RECOVERY_WAITING
Jul 28 17:28:05 truenas nvme1: resetting controller
Jul 28 17:28:36 truenas nvme1: aborting outstanding i/o
Jul 28 17:28:36 truenas nvme1[4139]: Last message 'aborting outstanding' repeated 1 times, suppressed by syslog-ng on truenas.local
Jul 28 17:28:36 truenas nvme1: RECOVERY_START 75555451022231 vs 75553445279732
Jul 28 17:28:36 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:28:36 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:28:36 truenas nvme1: RECOVERY_WAITING
Jul 28 17:28:36 truenas nvme1: resetting controller
Jul 28 17:28:36 truenas nvme1: aborting outstanding i/o
Jul 28 17:29:36 truenas nvme1[4139]: Last message 'aborting outstanding' repeated 1 times, suppressed by syslog-ng on truenas.local
Jul 28 17:29:51 truenas nvme1: RECOVERY_START 75877367388117 vs 75877182710912
Jul 28 17:29:51 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:29:51 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:29:51 truenas nvme1: RECOVERY_WAITING
Jul 28 17:29:51 truenas nvme1: resetting controller
Jul 28 17:29:51 truenas nvme1: aborting outstanding i/o
Jul 28 17:30:22 truenas nvme1: RECOVERY_START 76009454802683 vs 76008995250277
Jul 28 17:30:22 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:30:22 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:30:22 truenas nvme1: RECOVERY_WAITING
Jul 28 17:30:22 truenas nvme1: resetting controller
Jul 28 17:30:53 truenas nvme1: aborting outstanding i/o
Jul 28 17:30:53 truenas nvme1: RECOVERY_START 76142912315219 vs 76141138498917
Jul 28 17:30:53 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:30:53 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:30:53 truenas nvme1: RECOVERY_WAITING
Jul 28 17:30:53 truenas nvme1: resetting controller
Jul 28 17:31:24 truenas nvme1: aborting outstanding i/o
Jul 28 17:31:24 truenas nvme1: RECOVERY_START 76276322576058 vs 76274535879361
Jul 28 17:31:24 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:31:24 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:31:24 truenas nvme1: RECOVERY_WAITING
Jul 28 17:31:24 truenas nvme1: resetting controller
Jul 28 17:31:55 truenas nvme1: aborting outstanding i/o
Jul 28 17:31:55 truenas nvme1: RECOVERY_START 76409865985873 vs 76407950438907
Jul 28 17:31:55 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:31:55 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:31:55 truenas nvme1: RECOVERY_WAITING
Jul 28 17:31:55 truenas nvme1: resetting controller
Jul 28 17:31:55 truenas nvme1: aborting outstanding i/o
Jul 28 17:31:55 truenas nvme1: WRITE sqid:13 cid:118 nsid:1 lba:4025072 len:256
Jul 28 17:31:55 truenas nvme1: ABORTED - BY REQUEST (00/07) sqid:13 cid:118 cdw0:0
Jul 28 17:33:10 truenas nvme1: RECOVERY_START 76730858942848 vs 76729445936387
Jul 28 17:33:10 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:33:10 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:33:10 truenas nvme1: RECOVERY_WAITING
Jul 28 17:33:10 truenas nvme1: resetting controller
Jul 28 17:33:41 truenas nvme1: aborting outstanding i/o
Jul 28 17:33:41 truenas nvme1: RECOVERY_START 76864269199013 vs 76862491091818
Jul 28 17:33:41 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:33:41 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:33:41 truenas nvme1: RECOVERY_WAITING
Jul 28 17:33:41 truenas nvme1: resetting controller
Jul 28 17:34:12 truenas nvme1: aborting outstanding i/o
Jul 28 17:34:12 truenas nvme1: RECOVERY_START 76997752479287 vs 76995897063047
Jul 28 17:34:12 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:34:12 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:34:12 truenas nvme1: RECOVERY_WAITING
Jul 28 17:34:12 truenas nvme1: resetting controller
Jul 28 17:34:12 truenas nvme1: aborting outstanding i/o
Jul 28 17:35:45 truenas nvme1: RECOVERY_START 77395827207347 vs 77394951049849
Jul 28 17:35:45 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:35:45 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:35:45 truenas nvme1: RECOVERY_WAITING
Jul 28 17:35:45 truenas nvme1: resetting controller
Jul 28 17:36:31 truenas nvme1: aborting outstanding i/o
Jul 28 17:36:31 truenas nvme1: RECOVERY_START 77593348439420 vs 77591213848072
Jul 28 17:36:31 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:36:31 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:36:31 truenas nvme1: RECOVERY_WAITING
Jul 28 17:36:31 truenas nvme1: resetting controller
Jul 28 17:46:31 truenas nvme1: aborting outstanding i/o


Once we stop the migration, the resets continue for another few minutes, after which the pool resumes as if nothing happened.

Assuming we had faulty hardware, we removed the LOG vdev and created a new Pool with the two SSDs in MIrror.
When placed under load, the SSDs perform flawlessly without errors or resets.

Code:
root@truenas[/mnt/NVMETest]# fio --name=test --size=100g --rw=write --ioengine=posixaio --direct=1 --iodepth 32 --bs=4k
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=209MiB/s][w=53.4k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=6247: Fri Jul 28 14:46:33 2023
  write: IOPS=74.1k, BW=290MiB/s (304MB/s)(100GiB/353706msec); 0 zone resets
    slat (nsec): min=696, max=41826k, avg=3078.58, stdev=8679.47
    clat (usec): min=19, max=205080, avg=401.36, stdev=622.44
     lat (usec): min=21, max=205085, avg=404.44, stdev=622.67
    clat percentiles (usec):
     |  1.00th=[   91],  5.00th=[  129], 10.00th=[  149], 20.00th=[  180],
     | 30.00th=[  204], 40.00th=[  223], 50.00th=[  243], 60.00th=[  265],
     | 70.00th=[  293], 80.00th=[  338], 90.00th=[  562], 95.00th=[ 1844],
     | 99.00th=[ 3163], 99.50th=[ 3589], 99.90th=[ 4424], 99.95th=[ 5080],
     | 99.99th=[ 6521]
   bw (  KiB/s): min=50075, max=505109, per=100.00%, avg=296630.89, stdev=165476.46, samples=704
   iops        : min=12518, max=126277, avg=74157.41, stdev=41369.13, samples=704
  lat (usec)   : 20=0.01%, 50=0.04%, 100=1.50%, 250=51.57%, 500=35.68%
  lat (usec)   : 750=3.03%, 1000=0.81%
  lat (msec)   : 2=3.51%, 4=3.65%, 10=0.21%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=11.13%, sys=39.66%, ctx=1452026, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=7.4%, 16=83.7%, 32=8.8%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.1%, 8=0.8%, 16=2.8%, 32=2.3%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,26214400,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=290MiB/s (304MB/s), 290MiB/s-290MiB/s (304MB/s-304MB/s), io=100GiB (107GB), run=353706-353706msec


Removed the Overprovisioning value, reset the controllers with nvmecontrol and re-added as a MIRROR LOG - the issue persisted.

SMART Tests and nmvecontrol selftest all return without error.
No Firmware Updates available for the controller or system board.

Happy to provide any other info that may be usedful - I'd be very greatful for any advice!

Many thanks!
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Try with a single ssd installed: if things change, you need a different card.

Did you pass the controller?

How are the temperatures? Is any airflow on the card's heatsink?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
"We've had a TrueNAS 13.0-U5.2 running in production for a few months - we decided that the addition of a SLOG on M.2 would be a good move to improve our write performance."

What sort of writes are you doing - are they something that a SLOG would actually help with?
You appear to be using NFS - which is probaly sync writes - or you may be using ESX via NFS - also sync writes - so OK there.

The Crucial P5 Plus is a consumer grade SSD - I can't find any indication that its a suitable SLOG for the following reasons:
1. Endurance is so so
2. No PLP
3. Being a consumer grade SSD its latency / response time probably falls off a cliff after a while

Basically, from the looks of it - wrong kit, wrong job
 

mts1472

Cadet
Joined
Jul 29, 2023
Messages
2
Thanks @NugentS & Davvo, really appreciate your input.

Try with a single ssd installed: if things change, you need a different card.
I considered this, but though that the MIRROR was the only thing kee
Did you pass the controller?
Sorry, I'm not sure what you mean? TrueNAS is running bare metal, not inside a VM.
How are the temperatures? Is any airflow on the card's heatsink?
Airflow could be improved, but temps hover between 60-70C depending on load, there does not appear to be any correlation between temp changes and this issue occuring. Under heavy load, the resets start almost instantly, temp goes from normal downward.
What sort of writes are you doing - are they something that a SLOG would actually help with?
All NFS writes, general purpose VMs, nothing hardcore. The addition of SLOG was to ensure there would be no bump in IO delay during the occasional VM migration and during nightly backups.
The Crucial P5 Plus is a consumer grade SSD - I can't find any indication that its a suitable SLOG for the following reasons:
1. Endurance is so so
2. No PLP
I agree, not the best suited but what we had available at the time for POC. If performance was improved, the plan was to replace with some higher endurance SSDs long term.
3. Being a consumer grade SSD its latency / response time probably falls off a cliff after a while
This was my concern and the reason for testing them as a mirrored data vdev. The fio test hit higher throughput than when used as SLOG. Have I misinterpreted the fio result?
lat (usec) : 20=0.01%, 50=0.04%, 100=1.50%, 250=51.57%, 500=35.68%
lat (usec) : 750=3.03%, 1000=0.81%
lat (msec) : 2=3.51%, 4=3.65%, 10=0.21%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%

Many thanks!
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
I wouldn't even consider those drives as a POC. They are not suitable so as a POC they are useless

Better - but not perfect: Ebay 905P
Even Better (but not new, and only 1): Ebay P5800X or Ebay P4800X
There are other options on Ebay, lots of them. Do not try an H10 of whatever size nor the M10 which has much lower endurance and is only PCIe'n'X2 and is also unsuitable.

You get the message.
There is a long running thread about SLOG's, and various performance stats somewhere abouts collected by @HoneyBadger

I use single 900P (or maybe 905P) on a pool as a SLOG. The only real failure scenario that is bad is if the SLOG fails during an unexpected reboot - which would potentially be bad (but for a home system - just fine). If a SLOG fails during normal use then things slow down (a lot) but keep working

Optane is an almost ideal SLOG - its a shame they are no longer made (and why I have a bunch of spares)

Radian Memory Systems RMS300/16G is probably better - but like hen's teeth to get hold of. The RMS200/8G isn't IMHO big enough and performance suffers in some cases. The RMS300/16G is good for 10Gb, but probably not 25Gb and definately not 100Gb
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
There is a long running thread about SLOG's, and various performance stats somewhere abouts collected by @HoneyBadger

I assume you mean this one:


@mts1472 it doesn't necessarily seem like overheating, as it performs well under normal workload, but SLOG writes could be either exposing a firmware bug or causing enough additional workload through the synchronous writes (basically iodepth=1) forcing data onto the NAND that it's causing a specific component (the controller) to overheat before the temp sensor actually reaches the throttle point.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Yup - that's the one

@Davvo I don't have that many spares. 2 900P PCIe cards, and two 4800X M.2 and 2 U.2 900P (I think - they work so well that I kinda forget whats in the drawer of "where spares I have forgotten about go to hide")

OK - so I admit - I have an addiction - they say the first step to a cure is in admitting it!!

[But I don't want a cure]
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Re: possible overheating:
The M.2 SSD space is a big race of lots of teams with minimal engineering (notably Silicon Motion), buying from the same two or three companies, plus a few more-or-less vertically-integrated players from Samsung (fully vertically integrated) to Crucial/Micron (their own NAND, third-party controller).

PCIe SSDs started taking off circa 2014, with serious adoption closer to 2016ish. PCIe 3.0, at the time, had been available in real products for at least a year (Ivy Bridge in 2012) and the M.2 for factor had been designed for typical SSDs of the day. Cooling was a non-issue for 99% of scenarios. First-gen NVMe SSDs did not need to really push for full PCIe 3.0 x4 bandwidth either, since they just had to beat SATA 6Gb/s (roughly the same performance as PCIe 3.0 X1, but half duplex). The market had plenty of time to saturate PCIe 3.0 without creeping up against serious technical barriers.

After an eternity, PCIe 4.0 finally became relevant, and that changed things a bit. The desire to push twice as much data through a SSD brought a big problem in the form of power. The PCIe PHY needs more power, the NAND interface has to be faster and/or wider, which takes power, and the processor needs to be faster to shuffle the data around (and to keep track of data in the face of growing SSD sizes)... But M.2 was designed for more mundane applications and was not thought of as the performance option - that's what U.2 is all about.
Ordinarily, it would have taken two or three years to get to a second generation of PCIe 4.0 with meaningful power improvements... But the high end is rapidly upgrading to PCIe 5.0 now, which just makes the power situation worse.

The first major difficulty is getting rid of the heat, and that's why the market is flooded with (in)sane M.2 cooling solutions and everyone's scrambling to get their thermals under control. And because the last few years have been hectic in PCIe land, new SSD designs have been rushed out with less testing than they should have gotten.
 
Last edited:
Top