SSD Wear

WI_Hedgehog · Aug 15, 2023

joeschmuck said:
Nope, you need to be looking at the "VALUE" column which is "100". It can be tricky to know when to use the RAW value and when not to. I started to use the json formatted output, it appears to be more consistent and easier to grab the data.

Let me tell you, there are a lot of different ways the drives report Wear Level and it drives me nuts. Which one takes precedence? How is the data to be interpreted? Not all of them are in a percentage value, it must be calculated. Nuts I Say!

Here are some of what I need to examine and pray I selected the correct one, these are the values in the order in which I look for them:

SSD's
Percentage Used Endurance Indicator --- I prefer this one if it exists, it tells you how much has been used. The math is 100-this value=%left.
ID 231
Percent_Lifetime_Remain
Media_Wearout_Indicator
Wear_Leveling_Count
SSD_Life_Left

In addition to the above values:

NVMe's
.nvme_smart_health_information_log.available_spare

And if it's SCSI
.scsi_percentage_used_endurance_indicator

There is a rhyme and reason for all this. As far as I'm aware, the multi_report script reports every drive sample I've tested, but I'm not infallible. If you open up the script and search for "# Get Wear Level" then you will find the start of that section so you can see what is going on.

What data are you looking at? But nothing should suddenly jump from 100% to 85%. 100% = Full Life, 0% = Dead

I hope all the crap I typed helps some.

As a note, I am looking at the VALUE column which I why I have a happy face there and said,

233 Media_Wearout_Indicator 0x2300 100 100 000 Old_age Offline - 6490
This is what I (perhaps incorrectly) focus on. To me the raw value is "something," which is interpreted as 100% life left under the VALUE column.

which correlates with what you said in this thread.

joeschmuck · Aug 16, 2023

So I got the dump and I looked at the data, here is what I sent @NugentS

There is conflicting information, got to love those drive manufacturers:

Percentage Used Endurance Indicator = 15, meaning 15% of the drive is used.
ID 231 which is also a media wear indicator for most drives = 97, meaning 97% good.
Media_Wearout_Indicator = 100, meaning your value is 100% not used.

The script looks at these values in this order and if one of the values exist, it stops looking for other values. In this case the Percentage Used Endurance Indicator has a value so the script stops looking there.

Why do I use Percentage Used Endurance Indicator first? Because I have found it to be the most accurate when it’s present. The descriptions of it have not waivered.

So lets look at some additional data to see if we can figure out which one is most likely correct…

You have:

Blocks = 250069680
Bytes = 128035676160
Logical_block_size = 512
Physical_block_size = 512 (good thing this is not 4096 for a 4K block)

--- math results are you have a 128GB drive. I multiply blocks * block size to get my value, I could take bytes and just divide it, either way works to get 128GB for your drive. I have run into drives that do not have the “bytes” value which is why I multiply.

Unused_Rsvd_Blk_Cnt_Tot = 100 (meaning none have been used, this value is normalized)

Logical Sectors Written = 7155013221 (this is actual value, the math is this value * Logical Block Size = 3,663,366,769,152, or 3.66 TB)

Total_LBA’s_Written = 100 (I suspect meaning you are 100% good, this value is normalized)

Unfortunately you do not have a value for blocks erased which would allow us to know how many erase events remain, or how many blocks are failed, but since the unused reserved blocks are still at 100% (actual value is 60) then I have to assume you have no failed blocks yet. I do not know if 60 is the starting number of blocks or maybe it was 65 and 5 blocks are bad, the data does not give me a starting point.

Now for some assumptions: The typical number of erase cycles is 3000, but that is completely dependent on the flash memory in the device but 3000 is a good number to start with. I do know there are some flash memories with 2500 erase cycles and some up to and over 5000 erase cycles, but the really high values are expensive. If we assume 128GB * 3000 then that means you can erase and write approximately 384 TB of data, well it’s not data it’s 384GB worth of blocks written. Some data may consume only a few bytes of a block but the entire block gets erased so it counts as 512 bytes of data written. You only have 3.66 TB as of right now so it looks like you have used up approximately 1% of your erase cycles.

This all leads me to believe that the either ID 231 or ID 233 are more correct. But am I correct in my assumptions? This is why I use ‘Percentage Used Endurance Indicator’ as it has been more trust worthy in the past. Heck, it might actually be correct for your drive but my gut says you are either 97% or 100%.

I really wish the manufacturers would standardize these values, maybe someday? I’m not holding my breath.

If I could create a formula that would identify which value is more accurate, I would, however if it works for one person, it will likely fail for someone else.

My advice: Keep an eye on it. When this value drops significantly please toss me another dump so I can analyze the data again. Maybe I will figure out the true value and be able to create some formula like, if multiple values exist, then calculate erase cycles and choose the most likely correct value. It sounds easy in my mind but in practice, it’s never that easy.

Sorry I could not provide you a definitive answer but unfortunately I can only work with the data provided by the drive.

I also invite comments, good or bad, but if you have something negative to say about my process, please provide a viable alternative/solution. I'm absolutely willing to learn and adapt. My goal has always been to create the best script I can.

joeschmuck · Aug 16, 2023

I meant to post this earlier as well just to show you the differences between SSD and NVMe drives. NVMe drives are better and are minimalistic. There is more data than I have shown, but it's not an awful lot more.

Your NVMe drive has this data, I would be tickled pink if everyone would represent drive data this way, also only a single wear level value "percentage used":0, super easy to know that value is a normalized 0% worn out and there are no contradictions in this data. This is in a json format which I really like.

"nvme_smart_health_information_log": {
"critical_warning": 0,
"temperature": 39,
"available_spare": 100,
"available_spare_threshold": 0,
"percentage_used": 0,
"data_units_read": 2320860,
"data_units_written": 295724912,
"host_reads": 47814770,
"host_writes": 2885920626,
"controller_busy_time": 1270,
"power_cycles": 190,
"power_on_hours": 27950,
"unsafe_shutdowns": 59,
"media_errors": 0,
"num_err_log_entries": 0

And if either of you have a suggestion on the calculations, please let me know. I plan to publish version 2.4.4 of Multi-Report this weekend unless some update needs to be made. I'd love to roll in a better formula if it exists. My mind it thinking on how I could do this, it's pretty complicated as well. But still assumptions need to be made when there is conflicting data in a drive. Toss me some ideas. The only thing I have going on tomorrow is I'd like to go look for a new truck. Call it my retirement gift to myself. I'm not retiring until I pay it off, I can't retire with a big debt over my head.

Important Announcement for the TrueNAS Community.

SSD Wear

WI_Hedgehog

Guru

joeschmuck

Old Man

joeschmuck

Old Man

Similar threads

Important Announcement for the TrueNAS Community.

SSD Wear

WI_Hedgehog

Guru

joeschmuck

Old Man

joeschmuck

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SSD Wear"

Similar threads