Pool 'boot-pool' has encountered an uncorrectable I/O failure and has been suspended.

Joined
Mar 2, 2022
Messages
1
Hello,

I've got a TRUENAS-MINI-3.0-XL+ (about a year old) loaded with 8 (18TB) Western Digital Red Pros and 2 Samsung SSDs. The box boots off of the NVMe it came with (WDC WDS250G2B0C-00PXH0). There are three pools, boot-pol, ssd, hdd (RAIDZ1). It also has 64GB (ECC) of memory all purchased through iXsystems.

The Mini was running the latest version of TrueNAS Core until last week when I re-loaded it fresh with Bluefin 22.12 via a USB drive. There were no known issues at the time of the reload. I had the installer format the boot drive in order to start fresh. The existing pools for ssd and hdd were recreated as well. After the install was complete I reconfigured the box and off I went. I created an rsync task to copy 100TB+ from a Synology. I'm in the process of using this box to make a third copy of the data. About three days in and about 43TB of data copied the web ui become non-responsive and the the console was displaying an error. Log attached. I had no choice by to power cycle the box as the regular shutdown from the console wasn't working. Box rebooted and came back up without any additional errors or issues. I've since resumed the rsync.

I've added a zpool status and smartctl output of the NVMe below.

What I am trying to understand is if this issue is indicative of a pending hardware failure or if this could be something related to TrueNAS Scale, like a driver. I realize this version is considered Early Adopter so I wanted to check here. I looked at a few other posts that had a similar title but their issue didn't exactly feel like mine here.

Thanks.

zpool status:
Code:
  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
    The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
config:

    NAME         STATE     READ WRITE CKSUM
    boot-pool    ONLINE       0     0     0
      nvme0n1p3  ONLINE       0     0     0

errors: No known data errors


Nothing stands out from SMART:
Code:
=== START OF INFORMATION SECTION ===
Model Number:                       WDC WDS250G2B0C-00PXH0
Serial Number:                      *****
Firmware Version:                   233010WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 250,059,350,016 [250 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          250,059,350,016 [250 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b41835906
Local Time is:                      Sun Dec 18 16:12:23 2022 EST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.50W    2.00W       -    0  0  0  0        0       0
 1 +     2.40W    1.80W       -    0  0  0  0        0       0
 2 +     1.90W    1.50W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     3900   11000
 4 -   0.0050W       -        -    4  4  4  4     5000   44000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        37 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    2,403,130 [1.23 TB]
Data Units Written:                 2,079,647 [1.06 TB]
Host Read Commands:                 5,467,106
Host Write Commands:                3,531,411
Controller Busy Time:               171
Power Cycles:                       45
Power On Hours:                     7,049
Unsafe Shutdowns:                   9
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
 

Attachments

  • error.txt
    32.2 KB · Views: 85
Top