Detecting degraded SSD

Status
Not open for further replies.

ShimadaRiku

Contributor
Joined
Aug 28, 2015
Messages
104
Was testing with some old SSD and got peculiar write performances.

Pair of mirror SSD. This is my CIFS transfer rate for test 3GB file. Was great half way.
K9tN7Bu.png


Interesting how the transfer was able to go 90-100Mb/s saturating my Gigabit lan halfway until it fell off the cliff. I assume that was when the bad SSD was used in the mirror or buffer ran out?

Apparently one of the SSD I used was previously in a system without trim support and filled to max capacity. It was so badly degraded it's write performance was down to a crawl. I had to do a hardware level secure erase to restore it back to glory. Now it is able to sustain full 100 Mb/s transfer.

Now my question is there anything in freenas to detect or test for degraded SSD in a pool? I was confused for a awhile trying to look at all the settings and troubleshooting around until I realized the bad ssd was the culprit.
 

ShimadaRiku

Contributor
Joined
Aug 28, 2015
Messages
104
Were you able to find anything?

Besides a failed drive, don't think freenas has a way to detect/notify of a poorly performing drive in a vdev/pool. Would be all up to the user to watch the reporting menu for individual drive performance.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
This is expected behaviour for an SSD in the environment described, unfortunately. The best mitigation is probably underprovisioning, which would provide the SSD controller with a larger cache of free pages to draw on under demanding write conditions.

FreeNAS doesn't have any option to detect such issues, in part because performance is so dependent on so many factors. A modern HDD, for example, can be tossing 200MB/sec to or from its platters if you are performing sequential access. However, when you switch to random access 4K blocks, you can see that drop to less than 400KB (yes KB)/second. I wouldn't necessarily argue either case to be indicative of a problem, just a matter of the workload. In the case of your SSD, it sounds like you exhausted the free page pool and performance tanked. You can always get that to happen. It isn't clear at what point this might be flagged as a problem (either with the workload or with the SSD), especially since it was likely transient in nature.
 

ShimadaRiku

Contributor
Joined
Aug 28, 2015
Messages
104
In the case of your SSD, it sounds like you exhausted the free page pool and performance tanked.

Actually, it was a freshly added pair of SSD mirror. No storage on it yet.

One of the SSD was previously used in a machine that had no TRIM support, plus it was filled to max capacity; two big no-nos with ssd health. SSD don't over write data like HDD. They must copy it somewhere, erase, and write. But since it was filled to max capacity there were no space marked as free or very little. Like defragging a HD with 1% freespace or vaguely like how ZFS performance drop if you start to go above 80% capacity but worse for ssd.

Regular format & re-partitioning won't work. Even though the user see view it as empty drive, the SSD still consider cells as not empty and attempts to copy/erase/write shifting around. Guess HDD has 0 & 1, but SSD has 0, 1 & empty state. Only way to fix was a low level hardware level secure erase to tell the SSD that all cells are empty.

In my case, the file transfer went soooo slow that eventually my windows client actually error-ed out.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Correct; so, as I said, you exhausted the free page pool and performance tanked. ArsTechnica has a writeup of the issue if you like. The normal solution to this is to increase the reserve by underprovisioning the SSD, but that only puts off the problem and doesn't necessarily guarantee high performance.

If your SSD supports TRIM, the good news is that more recent versions of ZFS (including in the current FreeNAS) do support TRIM, so if you have wiped the SSD into a fresh state, and then use it with FreeNAS, ZFS will provide your SSD with the TRIM clues that it needs to have a better chance of maintaining performance. This doesn't guarantee performance. If, for example, you rapidly erase and then re-fill the SSD, it is perfectly possible that the garbage collection process within the SSD will be unable to keep pace with the demand for fresh blocks. So there's all sorts of factors that play into this.
 
Status
Not open for further replies.
Top