AvidDiskHoarder
Cadet
- Joined
- Mar 2, 2022
- Messages
- 1
Hello,
I've got a TRUENAS-MINI-3.0-XL+ (about a year old) loaded with 8 (18TB) Western Digital Red Pros and 2 Samsung SSDs. The box boots off of the NVMe it came with (WDC WDS250G2B0C-00PXH0). There are three pools, boot-pol, ssd, hdd (RAIDZ1). It also has 64GB (ECC) of memory all purchased through iXsystems.
The Mini was running the latest version of TrueNAS Core until last week when I re-loaded it fresh with Bluefin 22.12 via a USB drive. There were no known issues at the time of the reload. I had the installer format the boot drive in order to start fresh. The existing pools for ssd and hdd were recreated as well. After the install was complete I reconfigured the box and off I went. I created an rsync task to copy 100TB+ from a Synology. I'm in the process of using this box to make a third copy of the data. About three days in and about 43TB of data copied the web ui become non-responsive and the the console was displaying an error. Log attached. I had no choice by to power cycle the box as the regular shutdown from the console wasn't working. Box rebooted and came back up without any additional errors or issues. I've since resumed the rsync.
I've added a zpool status and smartctl output of the NVMe below.
What I am trying to understand is if this issue is indicative of a pending hardware failure or if this could be something related to TrueNAS Scale, like a driver. I realize this version is considered Early Adopter so I wanted to check here. I looked at a few other posts that had a similar title but their issue didn't exactly feel like mine here.
Thanks.
zpool status:
Nothing stands out from SMART:
I've got a TRUENAS-MINI-3.0-XL+ (about a year old) loaded with 8 (18TB) Western Digital Red Pros and 2 Samsung SSDs. The box boots off of the NVMe it came with (WDC WDS250G2B0C-00PXH0). There are three pools, boot-pol, ssd, hdd (RAIDZ1). It also has 64GB (ECC) of memory all purchased through iXsystems.
The Mini was running the latest version of TrueNAS Core until last week when I re-loaded it fresh with Bluefin 22.12 via a USB drive. There were no known issues at the time of the reload. I had the installer format the boot drive in order to start fresh. The existing pools for ssd and hdd were recreated as well. After the install was complete I reconfigured the box and off I went. I created an rsync task to copy 100TB+ from a Synology. I'm in the process of using this box to make a third copy of the data. About three days in and about 43TB of data copied the web ui become non-responsive and the the console was displaying an error. Log attached. I had no choice by to power cycle the box as the regular shutdown from the console wasn't working. Box rebooted and came back up without any additional errors or issues. I've since resumed the rsync.
I've added a zpool status and smartctl output of the NVMe below.
What I am trying to understand is if this issue is indicative of a pending hardware failure or if this could be something related to TrueNAS Scale, like a driver. I realize this version is considered Early Adopter so I wanted to check here. I looked at a few other posts that had a similar title but their issue didn't exactly feel like mine here.
Thanks.
zpool status:
Code:
pool: boot-pool state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. config: NAME STATE READ WRITE CKSUM boot-pool ONLINE 0 0 0 nvme0n1p3 ONLINE 0 0 0 errors: No known data errors
Nothing stands out from SMART:
Code:
=== START OF INFORMATION SECTION === Model Number: WDC WDS250G2B0C-00PXH0 Serial Number: ***** Firmware Version: 233010WD PCI Vendor/Subsystem ID: 0x15b7 IEEE OUI Identifier: 0x001b44 Total NVM Capacity: 250,059,350,016 [250 GB] Unallocated NVM Capacity: 0 Controller ID: 1 NVMe Version: 1.4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 250,059,350,016 [250 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 001b44 8b41835906 Local Time is: Sun Dec 18 16:12:23 2022 EST Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg Maximum Data Transfer Size: 128 Pages Warning Comp. Temp. Threshold: 80 Celsius Critical Comp. Temp. Threshold: 85 Celsius Namespace 1 Features (0x02): NA_Fields Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 3.50W 2.00W - 0 0 0 0 0 0 1 + 2.40W 1.80W - 0 0 0 0 0 0 2 + 1.90W 1.50W - 0 0 0 0 0 0 3 - 0.0250W - - 3 3 3 3 3900 11000 4 - 0.0050W - - 4 4 4 4 5000 44000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 2 1 - 4096 0 1 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 37 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 2,403,130 [1.23 TB] Data Units Written: 2,079,647 [1.06 TB] Host Read Commands: 5,467,106 Host Write Commands: 3,531,411 Controller Busy Time: 171 Power Cycles: 45 Power On Hours: 7,049 Unsafe Shutdowns: 9 Media and Data Integrity Errors: 0 Error Information Log Entries: 1 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Error Information (NVMe Log 0x01, 16 of 256 entries) No Errors Logged