I am getting a horrendous temperature value for an SSD but all SMART are clear

Joined
Feb 24, 2024
Messages
3
I was getting alarms at login about the following SSD: CT240BX500SSD1
All SMART tests of the SSD stay without error (extended offline 6x and one short offline).
The system claimed a temperature occurring of about and over 200 degrees Celsius. Now to tell the truth that would mean a: the system would already smoke and I could do some scrambled eggs on it.
I dismissed the alarm several times, then it occurred to me to activate the "energy saving". Now the alarm is gone. However, obviously there is a sensor problem occurring under scale or the hardware is reporting wrong?
Does anybody know if a bug is already reported or encountered this? If I should report (which would require to restore the condition of "no energy saving", what logs / data should I deliver?
I was actually having only the temp warnings in the bell when it occurred, which does not allow AFAIK for a meaningful bug report.
I looked at the reports for that disc but no temp values are given. Maybe the temperature is not revealed at all and that causes the strange reading?

Thank you.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
When it comes to sensors, I always start with a physical check. If it's throwing an alarm that borders on nonsensical, power off and physically touch the device or point an IR thermometer at it. SSD's can get quite hot. I did a hotplug test on a Enterprise U.2 NVMe device under heavy I/O a few years back and received a 1st degree burn for my effort. These drives usually have thick aluminium cases, and good thermal bonding. The BX500 of course is not in this category. It's a retail consumer device with a lot of engineering corners cut to keep the costs down. If you get a temp alarm and can't verify a hot case, I'd consider that perhaps the internal bonding has separated and the heat is not getting conducted away from the chips.
 
Joined
Feb 24, 2024
Messages
3
When it comes to sensors, I always start with a physical check. If it's throwing an alarm that borders on nonsensical, power off and physically touch the device or point an IR thermometer at it. SSD's can get quite hot. I did a hotplug test on a Enterprise U.2 NVMe device under heavy I/O a few years back and received a 1st degree burn for my effort. These drives usually have thick aluminium cases, and good thermal bonding. The BX500 of course is not in this category. It's a retail consumer device with a lot of engineering corners cut to keep the costs down. If you get a temp alarm and can't verify a hot case, I'd consider that perhaps the internal bonding has separated and the heat is not getting conducted away from the chips.
So you deem possible that a disc may have a clean SMART run with temperature reporting fine bug fails on sensors?
Here we go again: although this is a disc used for "boot pool" which should do no work at all I get:

Device: /dev/sdf [SAT], Failed SMART usage Attribute: 194 Temperature_Celsius..​

2024-03-13 14:23:41 (Europe/Berlin)
Ah, OK; now it reports smart too, so case closed the disk is already failing. Nice. (Has not passed the guarantee time, so still possible to open an RMA).
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
The system claimed a temperature occurring of about and over 200 degrees Celsius.
Please read the forum rules and post your system specs, for this specific problem it could made a difference, especially the version of TrueNAS you are running. Also post the entire output of smartctl -x /dev/xxx where xxx is the drive ident. Then a good recommendation can be given.
 
Joined
Feb 24, 2024
Messages
3
OS Version:TrueNAS-SCALE-23.10.2
Model:AMD FX(tm)-4300 Quad-Core Processor
Memory:31 GiB
I am in the shell. For my understanding this should be Linux. But I am not able to execute as admin smartctl -x /dev/sdf (gives command not known).
I also tried to at least copy the results of uname -a (while dmidecode gives also command not found) here, but I do not manage to copy and paste anything in the shell. I tried with ctrl + ins and ctl + del as well as ctrl +c and ctrl + p........no luck.
what do you use if not dmidecode?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
Use 'sudo' or login as root.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
But I am not able to execute as admin smartctl -x /dev/sdf (gives command not known).

It's "smartctl -a /dev/sdf", not "-x".

You may also try running "smartctl -t long /dev/sdf" and then wait the amount of time it reports, and then re-run the "-a" to collect the results. Take screen shots and/or copy the info for your warranty return / RMA.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
It's "smartctl -a /dev/sdf", not "-x".
Nope, I asked for "-x" for extended data. -a may not provide what I need, so I just ask for the full amount of data vice asking for it later.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Nope, I asked for "-x" for extended data. -a may not provide what I need, so I just ask for the full amount of data vice asking for it later.

Ok, I learned something new. It works on my Intel SSD's, but it doesn't work on my cheap Phison based boot drives. In fact I'd say they're outright lying, as the current, highest, and lowest temp are all the same.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,995
but it doesn't work on my cheap Phison based boot drives
I hear ya, I'm finding a lot of NVMe drives (older) are missing a lot of data, but at least there is a well defined specification now to fix that. But I still need to be aware that older drives are still being used. So it's not just SSDs that have issues. Also HDDs are not all the same.
 
Top