Long-time user of FreeNAS / TrueNAS, but first time poster. Apologies in advance for missing anything obvious in the post.
I've got a bit of a conundrum that googling or poking around the system hasn't been able to shed light on so far. The system in question is a pretty basic albeit mildly overpowered bare-metal home system (SuperMicro X10DRi, dual 2683v4, 128GiB RAM, dual 25Gb NICs running, running the latest Bluefin version (TrueNAS-SCALE-22.12.4.2)). I have two pools in the system, one being a 4-disk RAID10 of 3.84TiB U.2 NVMe disks on a PCIe v3 x16 to 4xU.2 card for VM disks and similar high-IOPS needs, and another for bulk storage and backups (RAID10 of currently 4 14TB SAS HDDs, being expanded with a third mirror vdev later today once the currently running resilvering completes, having a mirrored pair of small Optanes for metadata (will become a RAID10 of Optanes later today). The HDDs are managed by the on-board SAS controller (lsi3008 in IT mode).
The conundrum is about disk temperature reporting. For the HDD pool, none of the spinning disks show up in the reporting data for temperature, so at least the situation is consistent in that sense. The metadata NVMes do report temperature though. Since the HDDs are SAS, that may not be entirely surprising, but the disks do all properly report drive temperature using smartctl so it's not that the data doesn't exist.
For the U.2 pool, it's slightly more odd. Again, all four of the disks do report drive temperature when checking via smartctl, but for this pool, one (1) out of four disks actually get reported in the UI as well, whereas the remaining three do not. More specifically, nvme6n1 shows up, but nvme[457]n1 do not, and I can't come up with any reason why one would and the others not.
Is this behavior expected and if so why? If not, what can be done to get the SAS disks temp data to show up, and for the three AWOL NVMe temperature datasets to do the same?
	
	
		
			
		
	
	
	
		
			
		
	
	
	
		
			
		
	
	
	
		
			
		
	
	
	
		
			
		
	
	
	
		
			
		
	
	
		
			
		
		
	
			
			I've got a bit of a conundrum that googling or poking around the system hasn't been able to shed light on so far. The system in question is a pretty basic albeit mildly overpowered bare-metal home system (SuperMicro X10DRi, dual 2683v4, 128GiB RAM, dual 25Gb NICs running, running the latest Bluefin version (TrueNAS-SCALE-22.12.4.2)). I have two pools in the system, one being a 4-disk RAID10 of 3.84TiB U.2 NVMe disks on a PCIe v3 x16 to 4xU.2 card for VM disks and similar high-IOPS needs, and another for bulk storage and backups (RAID10 of currently 4 14TB SAS HDDs, being expanded with a third mirror vdev later today once the currently running resilvering completes, having a mirrored pair of small Optanes for metadata (will become a RAID10 of Optanes later today). The HDDs are managed by the on-board SAS controller (lsi3008 in IT mode).
The conundrum is about disk temperature reporting. For the HDD pool, none of the spinning disks show up in the reporting data for temperature, so at least the situation is consistent in that sense. The metadata NVMes do report temperature though. Since the HDDs are SAS, that may not be entirely surprising, but the disks do all properly report drive temperature using smartctl so it's not that the data doesn't exist.
For the U.2 pool, it's slightly more odd. Again, all four of the disks do report drive temperature when checking via smartctl, but for this pool, one (1) out of four disks actually get reported in the UI as well, whereas the remaining three do not. More specifically, nvme6n1 shows up, but nvme[457]n1 do not, and I can't come up with any reason why one would and the others not.
Is this behavior expected and if so why? If not, what can be done to get the SAS disks temp data to show up, and for the three AWOL NVMe temperature datasets to do the same?
  pool: hdd
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Oct 17 10:02:31 2023
    11.8T scanned at 619M/s, 9.80T issued at 514M/s, 12.7T total
    3.89T resilvered, 77.10% done, 01:38:59 to go
config:
    NAME                                        STATE     READ WRITE CKSUM
    hdd                                         ONLINE       0     0     0
      mirror-0                                  ONLINE       0     0     0
        48e25fc3-bdc1-4e33-b801-4a5519ef8c2f    ONLINE       0     0     0
        c374fe8e-efe3-4dc6-818b-ce520ef7805c    ONLINE       0     0     0
      mirror-1                                  ONLINE       0     0     0
        replacing-0                             ONLINE       0     0     0
          7ed29237-aae1-48f4-9770-9859bd61b39d  ONLINE       0     0     0
          c1b889ea-2a6e-4d8d-b15e-1386878fbc36  ONLINE       0     0     0  (resilvering)
        2dc86382-f20a-4e91-b08d-8020e38a209a    ONLINE       0     0     0
    special   
      mirror-2                                  ONLINE       0     0     0
        1665d28f-9513-4d39-882a-a29d03c19056    ONLINE       0     0     0
        40a98559-fc53-4caf-b4e8-a9d72eacec90    ONLINE       0     0     0
errors: No known data errors
  pool: nvme
 state: ONLINE
  scan: scrub repaired 0B in 00:35:18 with 0 errors on Sun Sep 24 00:35:19 2023
config:
    NAME                                      STATE     READ WRITE CKSUM
    nvme                                      ONLINE       0     0     0
      mirror-0                                ONLINE       0     0     0
        2b93aa42-2680-403d-a8d2-3256ebf2e619  ONLINE       0     0     0
        ecdbf9c1-ba12-4659-82f2-75b6424d655d  ONLINE       0     0     0
      mirror-1                                ONLINE       0     0     0
        de2be2df-bca2-4999-b6a1-e97092d4e931  ONLINE       0     0     0
        50be280b-1ca7-4ceb-bd5f-bc3f57baa828  ONLINE       0     0     0
errors: No known data errors
root@truenas[~]#
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.131+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WLEB14T0S5xeF7.2
Revision:             3P00
Compliance:           SPC-4
User Capacity:        14,000,519,643,136 bytes [14.0 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca2647dcaac
Serial number:        9RJ75LYC
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Oct 17 17:06:18 2023 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Grown defects during certification = 0
Total blocks reassigned during format = 0
Total new blocks reassigned = 0
Power on minutes since format = 80493
Current Drive Temperature:     32 C
Drive Trip Temperature:        85 C
Accumulated power on time, hours:minutes 1368:53
Manufactured in week 45 of year 2019
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  25
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  79
Elements in grown defect list: 0
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0       5489      13456.050           0
write:         0        0         0         0       5192       7443.623           0
verify:        0        0         0         0       4792         32.818           0
Non-medium error count:        0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.131+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPEK1A058GA
Serial Number:                      BTOC12850HQG058A
Firmware Version:                   U5110550
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       1.1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          58,977,157,120 [58.9 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            5cd2e4 2fff840100
Local Time is:                      Tue Oct 17 17:08:42 2023 PDT
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x0056):     Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     78 Celsius
Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     4.70W       -        -    0  0  0  0     1000    4000
 1 +     3.90W       -        -    0  1  0  1     1000    4000
 2 +     2.80W       -        -    0  2  0  2     1000    4000
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    158,503 [81.1 GB]
Data Units Written:                 5,896,357 [3.01 TB]
Host Read Commands:                 5,342,764
Host Write Commands:                102,024,781
Controller Busy Time:               52
Power Cycles:                       27
Power On Hours:                     5,918
Unsafe Shutdowns:                   2
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.131+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number:                       HUSPR3238ADP301
Serial Number:                      CJH0010094C0
Firmware Version:                   KMGNP131
PCI Vendor/Subsystem ID:            0x1c58
IEEE OUI Identifier:                0x000cca
Controller ID:                      3
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          3,820,752,101,376 [3.82 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000cca 00615b2f01
Local Time is:                      Tue Oct 17 17:10:50 2023 PDT
Firmware Updates (0x09):            4 Slots, Slot 1 R/O
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x01):         S/H_per_NS
Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W       -        -    0  0  0  0    15000   15000
 1 +    20.00W       -        -    1  1  1  1    15000   15000
 2 +    15.00W       -        -    2  2  2  2    15000   15000
 3 +    10.00W       -        -    3  3  3  3    15000   15000
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -     512       8         2
 2 -    4096       0         0
 3 -    4096       8         1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    87,344,117,101 [44.7 PB]
Data Units Written:                 351,739,898 [180 TB]
Host Read Commands:                 82,146,566,593
Host Write Commands:                1,517,163,751
Controller Busy Time:               2,263,307
Power Cycles:                       95
Power On Hours:                     51,552
Unsafe Shutdowns:                   70
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.131+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number:                       HUSPR3238ADP301
Serial Number:                      CJH0010094C0
Firmware Version:                   KMGNP131
PCI Vendor/Subsystem ID:            0x1c58
IEEE OUI Identifier:                0x000cca
Controller ID:                      3
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          3,820,752,101,376 [3.82 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000cca 00615b2f01
Local Time is:                      Tue Oct 17 17:10:50 2023 PDT
Firmware Updates (0x09):            4 Slots, Slot 1 R/O
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x01):         S/H_per_NS
Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W       -        -    0  0  0  0    15000   15000
 1 +    20.00W       -        -    1  1  1  1    15000   15000
 2 +    15.00W       -        -    2  2  2  2    15000   15000
 3 +    10.00W       -        -    3  3  3  3    15000   15000
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -     512       8         2
 2 -    4096       0         0
 3 -    4096       8         1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    87,344,117,101 [44.7 PB]
Data Units Written:                 351,739,898 [180 TB]
Host Read Commands:                 82,146,566,593
Host Write Commands:                1,517,163,751
Controller Busy Time:               2,263,307
Power Cycles:                       95
Power On Hours:                     51,552
Unsafe Shutdowns:                   70
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged