NVMe Critical Temperature Alert

rmccullough

Patron
Joined
May 17, 2018
Messages
269
I have 2 x 500GB WD Blue SN550 NVMe drives attached via Supermicro AOC-SLG3-2M2 PCIe Add-On Card. They are configured in a pool as mirrored drives for my jails and bhyve VMs.

Yesterday I started a transcode on Plex and a couple minutes into it received a critical alert about the drive temperatures being 44 degrees Celsius. I have the S.M.A.R.T. service configured for an alert above 40 degrees.

Is it typical for NVMe drives to run warmer than spinners? Or is this because the PCIe card may be too close to the processors, which were generating more heat during the transcode? Should I adjust the temperature alert threshold for the NVMe drives? What is a reasonable value?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
Yes NVMe run hotter than normal. 44 is no issue on a solid state drive
 

rmccullough

Patron
Joined
May 17, 2018
Messages
269
What is a reasonable upper threshold to set the drives to? 45? 50?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
What does the spec sheet of the nvme drives say? The drives should thermal throttle if they need to.
I personally would consider adding heatsinks to the drives and ensuring they are in a breeze
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I have seen numbers like 55 or even 85 as operating maximums on those types of drive.

smartctl will give you the data from the drive itself:
Code:
root@nas:~ # smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      SXXXXXXXXXXXXXXXX
Firmware Version:                   1B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            577,056,116,736 [577 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 529150394d
Local Time is:                      Wed Sep 29 11:04:00 2021 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        42 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    3%
Data Units Read:                    38,543,159 [19.7 TB]
Data Units Written:                 138,283,271 [70.8 TB]
Host Read Commands:                 152,395,477
Host Write Commands:                751,990,424
Controller Busy Time:               952
Power Cycles:                       10
Power On Hours:                     21,364
Unsafe Shutdowns:                   3
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               42 Celsius
Temperature Sensor 2:               48 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
 
Top