NVME in boot-pool disappears after some time

BKman

Cadet
Joined
Aug 8, 2023
Messages
4
Hello!

Just built out a new TrueNAS host (for the first time in a while building a PC, my first TrueNAS build):

Fractal Design Node 804 Black Window Aluminum/Steel Micro ATX Cube Computer Case1
WD Red Plus 12TB NAS Hard Disk Drive - 7200 RPM Class SATA 6Gb/s, CMR, 256MB Cache, 3.5 Inch - WD120EFBX8Spinning_Rust pool
Corsair Dual SSD Mounting Bracket (3.5” Internal Drive Bay to 2.5", Easy Installation) Black1
CORSAIR - RMe Series RM750e 80 PLUS Gold Fully Modular Low-Noise ATX 3.0 and PCIE 5.0 Power Supply - Black1
Corsair CP-8920186 Premium Individually Sleeved SATA Cable, Black PSUs 29.5 inches
SAMSUNG Electronics 870 EVO 2TB 2.5 Inch SATA III Internal SSD (MZ-77E2T0B/AM)2SSD pool
Transcend 128GB Nvme PCIe Gen3 X4 MTE110S M.2 SSD Solid State Drive TS128GMTE110S1boot-pool
Noctua NF-P12 redux-1700 PWM, High Performance Cooling Fan, 4-Pin, 1700 RPM (120mm, Grey)3
Intel Optane SSD P1600X SSDPEK1A058GA01 M.2 2280 58GB PCIe 3.0 x4, NVMe 3D XPoint Enterprise Solid State Disk1Spinning_Rust pool's LOG
Asrock Rack D1541D4U-2O8R Server Motherboard Intel Xeon D1541 SFP DDR4 ECC DIMM1
Samsung M393A4K40BB0-CPB 32GB DDR4-2133 Memory MEM-DR432L-SL01-ER214

I've used SCALE latest non-beta available version (TrueNAS-SCALE-22.12.3.3 atm).
Since the build, I noticed a couple of crashes - looks like the NVME driver from the boot-pool disappears. After a reboot, system works fine until the next time. Here are the syslog messages (God bless Graylog!):
Code:
2023-08-28T16:34:14.000-07:00 truenas zed[2303824]: eid=220 class=io_failure pool='boot-pool'
2023-08-28T16:34:14.000-07:00 truenas zed[2303822]: eid=219 class=io_failure pool='boot-pool'
2023-08-28T16:34:14.000-07:00 truenas kernel: WARNING: Pool 'boot-pool' has encountered an uncorrectable I/O failure and has been suspended.
2023-08-28T16:34:14.000-07:00 truenas zed[2303820]: eid=218 class=data pool='boot-pool' priority=0 err=6 flags=0x8881 bookmark=158:59988:0:0
2023-08-28T16:34:14.000-07:00 truenas zed[2303817]: eid=217 class=data pool='boot-pool' priority=0 err=6 flags=0x808881 bookmark=158:54802:0:0
2023-08-28T16:34:14.000-07:00 truenas kernel: WARNING: Pool 'boot-pool' has encountered an uncorrectable I/O failure and has been suspended.
2023-08-28T16:34:14.000-07:00 truenas kernel: zio pool=boot-pool vdev=/dev/nvme0n1p3 error=5 type=1 offset=11820916736 size=12288 flags=180880
2023-08-28T16:34:14.000-07:00 truenas kernel: nvme0n1: detected capacity change from 250069680 to 0
2023-08-28T16:34:14.000-07:00 truenas kernel: nvme nvme0: Removing after probe failure status: -19
2023-08-28T16:34:14.000-07:00 truenas kernel: nvme 0000:08:00.0: can't change power state from D3cold to D0 (config space inaccessible)
2023-08-28T16:34:14.000-07:00 truenas kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff

I've turned on the RAID mode for the Marvell 9172 controller and will watch how the system behaves in the coming days.
Also found a similar thread for Ubuntu kernels, not sure if it's applicable or not: https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg475194.html

Appreciate any feedback or advice.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
You might have to disable power save functions on your NVMe. I don't have the details, but I would guess that the NVMe going into power save is something ZFS is not expecting.
 
Top