NVMe Controller Reset Under Load (As SLOG Only)

mts1472 · Jul 29, 2023

Hello,

Looking for some guidance on an unusual issue.

We've had a TrueNAS 13.0-U5.2 running in production for a few months - we decided that the addition of a SLOG on M.2 would be a good move to improve our write performance.

We installed 2 x M.2 500GB Crucial P5 Plus SSDs in a M.2 PCI Adapter (https://www.startech.com/en-us/hdd/pex8m2e2)
These were configured in a LOG MIRROR with 15GiB Overprovisioning set.

After a few days we noticed periodic 'stalls' in IO. After testing, we found that under moderate write load (two concurrent VM migrations from another NAS), the NFS Share would become inaccessible to Proxmox (with the migrations appearing to continue at a crawing speed). While monitoring zpool iostat, all pool activity does appear to cease. Log entries as below throughout the 'stall'.

Code:

Jul 28 16:33:18 truenas nvme1: Resetting controller due to a timeout.
Jul 28 16:33:18 truenas nvme1: RECOVERY_WAITING
Jul 28 16:33:18 truenas nvme1: resetting controller
Jul 28 16:33:18 truenas nvme1: aborting outstanding i/o
Jul 28 16:34:42 truenas nvme0: RECOVERY_START 61662237802418 vs 61661980108095
Jul 28 16:34:42 truenas nvme0: timeout with nothing complete, resetting
Jul 28 16:34:42 truenas nvme0: Resetting controller due to a timeout.
Jul 28 16:34:42 truenas nvme0: RECOVERY_WAITING
Jul 28 16:34:42 truenas nvme0: resetting controller
Jul 28 16:34:42 truenas nvme0: aborting outstanding i/o
Jul 28 16:35:13 truenas nvme0: RECOVERY_START 61795648062625 vs 61793861364921
Jul 28 16:35:13 truenas nvme0: timeout with nothing complete, resetting
Jul 28 16:35:13 truenas nvme0: Resetting controller due to a timeout.
Jul 28 16:35:13 truenas nvme0: RECOVERY_WAITING
Jul 28 16:35:13 truenas nvme0: resetting controller
Jul 28 16:35:13 truenas nvme0: aborting outstanding i/o
Jul 28 16:36:54 truenas nvme1: RECOVERY_START 62229113312952 vs 62227429693240
Jul 28 16:36:54 truenas nvme1: timeout with nothing complete, resetting
Jul 28 16:36:54 truenas nvme1: Resetting controller due to a timeout.
Jul 28 16:36:54 truenas nvme1: RECOVERY_WAITING
Jul 28 16:36:54 truenas nvme1: resetting controller
Jul 28 17:18:07 truenas nvme1: aborting outstanding i/o
Jul 28 17:18:07 truenas nvme1: RECOVERY_START 72853744986354 vs 72851855207762
Jul 28 17:18:07 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:18:07 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:18:07 truenas nvme1: RECOVERY_WAITING
Jul 28 17:18:07 truenas nvme1: resetting controller
Jul 28 17:18:07 truenas nvme1: aborting outstanding i/o
Jul 28 17:19:16 truenas nvme1: RECOVERY_START 73150535787716 vs 73148701852485
Jul 28 17:19:16 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:19:16 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:19:16 truenas nvme1: RECOVERY_WAITING
Jul 28 17:19:16 truenas nvme1: resetting controller
Jul 28 17:19:16 truenas nvme1: aborting outstanding i/o
Jul 28 17:19:47 truenas nvme1: RECOVERY_START 73282232361163 vs 73282159356784
Jul 28 17:19:47 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:19:47 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:19:47 truenas nvme1: RECOVERY_WAITING
Jul 28 17:19:47 truenas nvme1: resetting controller
Jul 28 17:20:18 truenas nvme1: aborting outstanding i/o
Jul 28 17:20:18 truenas nvme1: RECOVERY_START 73416046354348 vs 73413916058414
Jul 28 17:20:18 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:20:18 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:20:18 truenas nvme1: RECOVERY_WAITING
Jul 28 17:20:18 truenas nvme1: resetting controller
Jul 28 17:20:18 truenas nvme1: aborting outstanding i/o
Jul 28 17:21:31 truenas nvme1: RECOVERY_START 73730463714189 vs 73729282600882
Jul 28 17:21:31 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:21:31 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:21:31 truenas nvme1: RECOVERY_WAITING
Jul 28 17:21:31 truenas nvme1: resetting controller
Jul 28 17:22:29 truenas nvme1: aborting outstanding i/o
Jul 28 17:22:29 truenas nvme1: RECOVERY_START 73978347730675 vs 73976402117990
Jul 28 17:22:29 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:22:29 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:22:29 truenas nvme1: RECOVERY_WAITING
Jul 28 17:22:29 truenas nvme1: resetting controller
Jul 28 17:23:00 truenas nvme1: aborting outstanding i/o
Jul 28 17:23:00 truenas nvme1: RECOVERY_START 74111757999608 vs 74109975590062
Jul 28 17:23:00 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:23:00 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:23:00 truenas nvme1: RECOVERY_WAITING
Jul 28 17:23:00 truenas nvme1: resetting controller
Jul 28 17:23:55 truenas nvme1: aborting outstanding i/o
Jul 28 17:23:55 truenas nvme1: RECOVERY_START 74347886689994 vs 74347830863198
Jul 28 17:23:55 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:23:55 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:23:55 truenas nvme1: RECOVERY_WAITING
Jul 28 17:23:55 truenas nvme1: resetting controller
Jul 28 17:27:34 truenas nvme1: aborting outstanding i/o
Jul 28 17:27:34 truenas nvme1: RECOVERY_START 75288282605573 vs 75286667700742
Jul 28 17:27:34 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:27:34 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:27:34 truenas nvme1: RECOVERY_WAITING
Jul 28 17:27:34 truenas nvme1: resetting controller
Jul 28 17:28:05 truenas nvme1: aborting outstanding i/o
Jul 28 17:28:05 truenas nvme1[4139]: Last message 'aborting outstanding' repeated 1 times, suppressed by syslog-ng on truenas.local
Jul 28 17:28:05 truenas nvme1: RECOVERY_START 75421692862375 vs 75419897576202
Jul 28 17:28:05 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:28:05 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:28:05 truenas nvme1: RECOVERY_WAITING
Jul 28 17:28:05 truenas nvme1: resetting controller
Jul 28 17:28:36 truenas nvme1: aborting outstanding i/o
Jul 28 17:28:36 truenas nvme1[4139]: Last message 'aborting outstanding' repeated 1 times, suppressed by syslog-ng on truenas.local
Jul 28 17:28:36 truenas nvme1: RECOVERY_START 75555451022231 vs 75553445279732
Jul 28 17:28:36 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:28:36 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:28:36 truenas nvme1: RECOVERY_WAITING
Jul 28 17:28:36 truenas nvme1: resetting controller
Jul 28 17:28:36 truenas nvme1: aborting outstanding i/o
Jul 28 17:29:36 truenas nvme1[4139]: Last message 'aborting outstanding' repeated 1 times, suppressed by syslog-ng on truenas.local
Jul 28 17:29:51 truenas nvme1: RECOVERY_START 75877367388117 vs 75877182710912
Jul 28 17:29:51 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:29:51 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:29:51 truenas nvme1: RECOVERY_WAITING
Jul 28 17:29:51 truenas nvme1: resetting controller
Jul 28 17:29:51 truenas nvme1: aborting outstanding i/o
Jul 28 17:30:22 truenas nvme1: RECOVERY_START 76009454802683 vs 76008995250277
Jul 28 17:30:22 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:30:22 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:30:22 truenas nvme1: RECOVERY_WAITING
Jul 28 17:30:22 truenas nvme1: resetting controller
Jul 28 17:30:53 truenas nvme1: aborting outstanding i/o
Jul 28 17:30:53 truenas nvme1: RECOVERY_START 76142912315219 vs 76141138498917
Jul 28 17:30:53 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:30:53 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:30:53 truenas nvme1: RECOVERY_WAITING
Jul 28 17:30:53 truenas nvme1: resetting controller
Jul 28 17:31:24 truenas nvme1: aborting outstanding i/o
Jul 28 17:31:24 truenas nvme1: RECOVERY_START 76276322576058 vs 76274535879361
Jul 28 17:31:24 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:31:24 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:31:24 truenas nvme1: RECOVERY_WAITING
Jul 28 17:31:24 truenas nvme1: resetting controller
Jul 28 17:31:55 truenas nvme1: aborting outstanding i/o
Jul 28 17:31:55 truenas nvme1: RECOVERY_START 76409865985873 vs 76407950438907
Jul 28 17:31:55 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:31:55 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:31:55 truenas nvme1: RECOVERY_WAITING
Jul 28 17:31:55 truenas nvme1: resetting controller
Jul 28 17:31:55 truenas nvme1: aborting outstanding i/o
Jul 28 17:31:55 truenas nvme1: WRITE sqid:13 cid:118 nsid:1 lba:4025072 len:256
Jul 28 17:31:55 truenas nvme1: ABORTED - BY REQUEST (00/07) sqid:13 cid:118 cdw0:0
Jul 28 17:33:10 truenas nvme1: RECOVERY_START 76730858942848 vs 76729445936387
Jul 28 17:33:10 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:33:10 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:33:10 truenas nvme1: RECOVERY_WAITING
Jul 28 17:33:10 truenas nvme1: resetting controller
Jul 28 17:33:41 truenas nvme1: aborting outstanding i/o
Jul 28 17:33:41 truenas nvme1: RECOVERY_START 76864269199013 vs 76862491091818
Jul 28 17:33:41 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:33:41 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:33:41 truenas nvme1: RECOVERY_WAITING
Jul 28 17:33:41 truenas nvme1: resetting controller
Jul 28 17:34:12 truenas nvme1: aborting outstanding i/o
Jul 28 17:34:12 truenas nvme1: RECOVERY_START 76997752479287 vs 76995897063047
Jul 28 17:34:12 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:34:12 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:34:12 truenas nvme1: RECOVERY_WAITING
Jul 28 17:34:12 truenas nvme1: resetting controller
Jul 28 17:34:12 truenas nvme1: aborting outstanding i/o
Jul 28 17:35:45 truenas nvme1: RECOVERY_START 77395827207347 vs 77394951049849
Jul 28 17:35:45 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:35:45 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:35:45 truenas nvme1: RECOVERY_WAITING
Jul 28 17:35:45 truenas nvme1: resetting controller
Jul 28 17:36:31 truenas nvme1: aborting outstanding i/o
Jul 28 17:36:31 truenas nvme1: RECOVERY_START 77593348439420 vs 77591213848072
Jul 28 17:36:31 truenas nvme1: timeout with nothing complete, resetting
Jul 28 17:36:31 truenas nvme1: Resetting controller due to a timeout.
Jul 28 17:36:31 truenas nvme1: RECOVERY_WAITING
Jul 28 17:36:31 truenas nvme1: resetting controller
Jul 28 17:46:31 truenas nvme1: aborting outstanding i/o

Once we stop the migration, the resets continue for another few minutes, after which the pool resumes as if nothing happened.

Assuming we had faulty hardware, we removed the LOG vdev and created a new Pool with the two SSDs in MIrror.
When placed under load, the SSDs perform flawlessly without errors or resets.

Code:

root@truenas[/mnt/NVMETest]# fio --name=test --size=100g --rw=write --ioengine=posixaio --direct=1 --iodepth 32 --bs=4k
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=32
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=209MiB/s][w=53.4k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=6247: Fri Jul 28 14:46:33 2023
  write: IOPS=74.1k, BW=290MiB/s (304MB/s)(100GiB/353706msec); 0 zone resets
    slat (nsec): min=696, max=41826k, avg=3078.58, stdev=8679.47
    clat (usec): min=19, max=205080, avg=401.36, stdev=622.44
     lat (usec): min=21, max=205085, avg=404.44, stdev=622.67
    clat percentiles (usec):
     |  1.00th=[   91],  5.00th=[  129], 10.00th=[  149], 20.00th=[  180],
     | 30.00th=[  204], 40.00th=[  223], 50.00th=[  243], 60.00th=[  265],
     | 70.00th=[  293], 80.00th=[  338], 90.00th=[  562], 95.00th=[ 1844],
     | 99.00th=[ 3163], 99.50th=[ 3589], 99.90th=[ 4424], 99.95th=[ 5080],
     | 99.99th=[ 6521]
   bw (  KiB/s): min=50075, max=505109, per=100.00%, avg=296630.89, stdev=165476.46, samples=704
   iops        : min=12518, max=126277, avg=74157.41, stdev=41369.13, samples=704
  lat (usec)   : 20=0.01%, 50=0.04%, 100=1.50%, 250=51.57%, 500=35.68%
  lat (usec)   : 750=3.03%, 1000=0.81%
  lat (msec)   : 2=3.51%, 4=3.65%, 10=0.21%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=11.13%, sys=39.66%, ctx=1452026, majf=0, minf=1
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=7.4%, 16=83.7%, 32=8.8%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.1%, 8=0.8%, 16=2.8%, 32=2.3%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,26214400,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=290MiB/s (304MB/s), 290MiB/s-290MiB/s (304MB/s-304MB/s), io=100GiB (107GB), run=353706-353706msec

Removed the Overprovisioning value, reset the controllers with nvmecontrol and re-added as a MIRROR LOG - the issue persisted.

SMART Tests and nmvecontrol selftest all return without error.
No Firmware Updates available for the controller or system board.

Happy to provide any other info that may be usedful - I'd be very greatful for any advice!

Many thanks!

Davvo · Jul 30, 2023

Try with a single ssd installed: if things change, you need a different card.

Did you pass the controller?

How are the temperatures? Is any airflow on the card's heatsink?

NugentS · Jul 30, 2023

"We've had a TrueNAS 13.0-U5.2 running in production for a few months - we decided that the addition of a SLOG on M.2 would be a good move to improve our write performance."

What sort of writes are you doing - are they something that a SLOG would actually help with?
You appear to be using NFS - which is probaly sync writes - or you may be using ESX via NFS - also sync writes - so OK there.

The Crucial P5 Plus is a consumer grade SSD - I can't find any indication that its a suitable SLOG for the following reasons:
1. Endurance is so so
2. No PLP
3. Being a consumer grade SSD its latency / response time probably falls off a cliff after a while

Basically, from the looks of it - wrong kit, wrong job

mts1472 · Jul 30, 2023

Thanks @NugentS & Davvo, really appreciate your input.

Try with a single ssd installed: if things change, you need a different card.

I considered this, but though that the MIRROR was the only thing kee

Did you pass the controller?

Sorry, I'm not sure what you mean? TrueNAS is running bare metal, not inside a VM.

How are the temperatures? Is any airflow on the card's heatsink?

Airflow could be improved, but temps hover between 60-70C depending on load, there does not appear to be any correlation between temp changes and this issue occuring. Under heavy load, the resets start almost instantly, temp goes from normal downward.

What sort of writes are you doing - are they something that a SLOG would actually help with?

All NFS writes, general purpose VMs, nothing hardcore. The addition of SLOG was to ensure there would be no bump in IO delay during the occasional VM migration and during nightly backups.

The Crucial P5 Plus is a consumer grade SSD - I can't find any indication that its a suitable SLOG for the following reasons:
1. Endurance is so so
2. No PLP

I agree, not the best suited but what we had available at the time for POC. If performance was improved, the plan was to replace with some higher endurance SSDs long term.

3. Being a consumer grade SSD its latency / response time probably falls off a cliff after a while

This was my concern and the reason for testing them as a mirrored data vdev. The fio test hit higher throughput than when used as SLOG. Have I misinterpreted the fio result?

lat (usec) : 20=0.01%, 50=0.04%, 100=1.50%, 250=51.57%, 500=35.68%
lat (usec) : 750=3.03%, 1000=0.81%
lat (msec) : 2=3.51%, 4=3.65%, 10=0.21%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%

Many thanks!

NugentS · Jul 30, 2023

I wouldn't even consider those drives as a POC. They are not suitable so as a POC they are useless

Better - but not perfect: Ebay 905P
Even Better (but not new, and only 1): Ebay P5800X or Ebay P4800X
There are other options on Ebay, lots of them. Do not try an H10 of whatever size nor the M10 which has much lower endurance and is only PCIe'n'X2 and is also unsuitable.

You get the message.
There is a long running thread about SLOG's, and various performance stats somewhere abouts collected by @HoneyBadger

I use single 900P (or maybe 905P) on a pool as a SLOG. The only real failure scenario that is bad is if the SLOG fails during an unexpected reboot - which would potentially be bad (but for a home system - just fine). If a SLOG fails during normal use then things slow down (a lot) but keep working

Optane is an almost ideal SLOG - its a shame they are no longer made (and why I have a bunch of spares)

Radian Memory Systems RMS300/16G is probably better - but like hen's teeth to get hold of. The RMS200/8G isn't IMHO big enough and performance suffers in some cases. The RMS300/16G is good for 10Gb, but probably not 25Gb and definately not 100Gb

Davvo · Jul 30, 2023

mts1472 said:
Sorry, I'm not sure what you mean? TrueNAS is running bare metal, not inside a VM.

My mistake, I misread your post.

NugentS said:
(and why I have a bunch of spares)

You damn monopolist.

HoneyBadger · Jul 30, 2023

NugentS said:
There is a long running thread about SLOG's, and various performance stats somewhere abouts collected by @HoneyBadger

I assume you mean this one:

SLOG benchmarking and finding the best SLOG

I'd like to take a few minutes to talk about SLOG devices and what makes good ones versus bad ones. I have no doubt that this will be a controversial topic since this is not well understood by many people. In short, there's 3 things that you need for a "great" SLOG: 1. Fast throughput 2...

www.truenas.com

@mts1472 it doesn't necessarily seem like overheating, as it performs well under normal workload, but SLOG writes could be either exposing a firmware bug or causing enough additional workload through the synchronous writes (basically iodepth=1) forcing data onto the NAND that it's causing a specific component (the controller) to overheat before the temp sensor actually reaches the throttle point.

NugentS · Jul 30, 2023

Yup - that's the one

@Davvo I don't have that many spares. 2 900P PCIe cards, and two 4800X M.2 and 2 U.2 900P (I think - they work so well that I kinda forget whats in the drawer of "where spares I have forgotten about go to hide")

OK - so I admit - I have an addiction - they say the first step to a cure is in admitting it!!

[But I don't want a cure]

Ericloewe · Jul 30, 2023

Re: possible overheating:
The M.2 SSD space is a big race of lots of teams with minimal engineering (notably Silicon Motion), buying from the same two or three companies, plus a few more-or-less vertically-integrated players from Samsung (fully vertically integrated) to Crucial/Micron (their own NAND, third-party controller).

PCIe SSDs started taking off circa 2014, with serious adoption closer to 2016ish. PCIe 3.0, at the time, had been available in real products for at least a year (Ivy Bridge in 2012) and the M.2 for factor had been designed for typical SSDs of the day. Cooling was a non-issue for 99% of scenarios. First-gen NVMe SSDs did not need to really push for full PCIe 3.0 x4 bandwidth either, since they just had to beat SATA 6Gb/s (roughly the same performance as PCIe 3.0 X1, but half duplex). The market had plenty of time to saturate PCIe 3.0 without creeping up against serious technical barriers.

After an eternity, PCIe 4.0 finally became relevant, and that changed things a bit. The desire to push twice as much data through a SSD brought a big problem in the form of power. The PCIe PHY needs more power, the NAND interface has to be faster and/or wider, which takes power, and the processor needs to be faster to shuffle the data around (and to keep track of data in the face of growing SSD sizes)... But M.2 was designed for more mundane applications and was not thought of as the performance option - that's what U.2 is all about.
Ordinarily, it would have taken two or three years to get to a second generation of PCIe 4.0 with meaningful power improvements... But the high end is rapidly upgrading to PCIe 5.0 now, which just makes the power situation worse.

The first major difficulty is getting rid of the heat, and that's why the market is flooded with (in)sane M.2 cooling solutions and everyone's scrambling to get their thermals under control. And because the last few years have been hectic in PCIe land, new SSD designs have been rushed out with less testing than they should have gotten.

Important Announcement for the TrueNAS Community.

NVMe Controller Reset Under Load (As SLOG Only)

mts1472

Cadet

Davvo

MVP

NugentS

MVP

mts1472

Cadet

NugentS

MVP

Davvo

MVP

HoneyBadger

actually does care

SLOG benchmarking and finding the best SLOG

NugentS

MVP

Ericloewe

Server Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

NVMe Controller Reset Under Load (As SLOG Only)

Cadet

MVP

MVP

Cadet

MVP

MVP

actually does care

MVP

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "NVMe Controller Reset Under Load (As SLOG Only)"

Similar threads