Please rate: Smart-Error on new SSD critical?

saveZFS

Explorer
Joined
Jan 6, 2022
Messages
87
Hello,
I get the following error message with one of my SSDs (not even a new week old).
"Device: /dev/nvme0, number of Error Log entries increased from 34 to 36."

The following error is entered in the smart values.
error ssd.JPG


Is this error critical and should I have the SSD replaced or is that unimportant and I can leave the SSD in the system?
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Check the full SMART output from it first, but it's not looking great to have 36 bad things to say in the first week of its life.
 

saveZFS

Explorer
Joined
Jan 6, 2022
Messages
87
Check the full SMART output from it first, but it's not looking great to have 36 bad things to say in the first week of its life.
Thank you for your first assessment.
Is this the smart output you mean?

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 35 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 495,158 [253 GB]
Data Units Written: 2,521,738 [1.29 TB]
Host Read Commands: 2,461,910
Host Write Commands: 21,383,802
Controller Busy Time: 182
Power Cycles: 3
Power On Hours: 120
Unsafe Shutdowns: 1
Media and Data Integrity Errors: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 36
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 49 Celsius
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I can confirm that I have devices with 50TB or more written that have no errors logged, so I would consider it to not be normal to have any errors. Making 36 sound like something to make you want to send it back.

I was hoping to be able to see the 36 errors via smartctl, but it seems not...

If we're on CORE here, you'll need to run nvmecontrol logpage -p 1 -x nvme0 (assuming it's nvme0 we're talking about).

What we then do with the Hex table dump, I'm not as sure, but all I can see on mine is zeros, so can't see what might be the format without some actual errors to go on.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Please post the entire result from the smart output - in code tags. It would also help if you read the forum rules about posting your hardware setup - so we know what we are dealing with
 

saveZFS

Explorer
Joined
Jan 6, 2022
Messages
87
Please post the entire result from the smart output - in code tags.
What do you mean exactly, if I may ask?
So that I can do better in the future!
I will also list my system in the signature in the future. Thanks for the hint! :)

For the two SSDs I have already decided that they will go back to the dealer and I will order others. I wasn't satisfied with the sync-write performance anyway.
 
Joined
Oct 22, 2019
Messages
3,641
"Device: /dev/nvme0, number of Error Log entries increased from 34 to 36."
Likely to increase with every reboot or suspend. It has something to do with the NVMe's firmware and/or the OS not sending appropriate signals to the device. It's harmless.

It's perfectly fine, unless there's reason to suspect otherwise.

SMART doesn't play well (nor understand well) NVMe drives.

One of my NVMe has 64 "errors".

The error log is spammed with these:
Code:
 Entry[63]
.................
error_count     : 0
sqid            : 0
cmdid           : 0
status_field    : 0(SUCCESS: The command completed successfully)
phase_tag       : 0
parm_err_loc    : 0
lba             : 0
nsid            : 0
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
cs              : 0
trtype_spec_info: 0
.................


---


Why does the Number of Error Information Log Entries in the SMART values
increase along with the number of times the computer is turned on and off?



Due to the fact that some platforms send non-NVMe command signals to the SSD,
the Number of Error Information Log Entries in the SMART values of the SSD
continues to increase. Error notification messages even appear on the screen
when booting.

The counting of this value will not cause any impact on the use of the SSD.
Customers may use it with peace of mind.



Some further reading: https://forum.proxmox.com/threads/pve-7-1-new-smart-error-on-every-hard-disk-at-every-reboot.101063/

mira said:
It seems this is an incompatibility between the kernel nvme interface and the NVMe. A future firmware update might fix this issue.

Feni said:
Same here with an older Samsung 950 Pro. No negative impact detected, other than an increasing counter.

So theoretically, these increasing "errors" may pause after a kernel update and/or NVMe firmware update. Or you can just ignore them. :wink:
 
Last edited:

saveZFS

Explorer
Joined
Jan 6, 2022
Messages
87
So theoretically, these increasing "errors" may pause after a kernel update and/or NVMe firmware update. Or you can just ignore them. :wink:
Thank you very much, then I will keep the SSDs and not send them back to the dealer!
Although I will still need other SSDs in the long run because the sync-write performance is just extremely bad! :(
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
What do you mean exactly, if I may ask?
So that I can do better in the future!
I will also list my system in the signature in the future. Thanks for the hint! :)

For the two SSDs I have already decided that they will go back to the dealer and I will order others. I wasn't satisfied with the sync-write performance anyway.
You only appeared to post the second half of the smart output. TRhe first part would have told us what they were (for example)
For example
root@newnas[~]# smartctl -a /dev/nvme1 smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO 500GB Serial Number: S5H7NS0NA72055A Firmware Version: 2B2QEXE7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 500,107,862,016 [500 GB] Unallocated NVM Capacity: 0 Controller ID: 4 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 500,107,862,016 [500 GB] Namespace 1 Utilization: 373,901,086,720 [373 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 5a0143d2fd Local Time is: Mon Apr 11 23:16:00 2022 BST Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 85 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 6.20W - - 0 0 0 0 0 0 1 + 4.30W - - 1 1 1 1 0 0 2 + 2.10W - - 2 2 2 2 0 0 3 - 0.0400W - - 3 3 3 3 210 1200 4 - 0.0050W - - 4 4 4 4 2000 8000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 30 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 1,059,596 [542 GB] Data Units Written: 3,560,615 [1.82 TB] Host Read Commands: 16,171,177 Host Write Commands: 69,974,540 Controller Busy Time: 103 Power Cycles: 66 Power On Hours: 3,832 Unsafe Shutdowns: 24 Media and Data Integrity Errors: 0 Error Information Log Entries: 161 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 30 Celsius Temperature Sensor 2: 36 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) No Errors Logged
 

SirNomad49

Cadet
Joined
Jul 10, 2021
Messages
9
Same topic.


 
Top