FreeNAS destroyes USB Sticks and SSD's frequently

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
Hello,

I have been using FreeNAS for over 3 years now and the death rate of my boot drives is very high. I have done some counting and want to present the results.
I started off by using USB sticks as boot drives for FreeNAS (started with 9.2) and later switched to SSD's since it is recommended nowadays to do so. Currently I am running 11.3U5. We are talking 2 years of USB stick and 1 year of SSD usage. And again, we are only talking about a boot drive mirror consisting of either 2 USB sticks or 2 SSD's.

I have 2 machines running FreeNAS 24/7 and both use 2 drives as a mirrowed boot pool.

In the first 2 years of usage I have lost 11 USB sticks of various brands (Intenso, SanDisk, Samsung, ...)
That includes a total loss of the boot mirror (both USB sticks faild at the same time).

In the 3rd year when I switched to SSD's (Crucial MX500 and WD Blue) I lost 3 SSD's.

I have done a lot of research and there is nothing wrong with either my configuration or hardware in general.

In addition to that it seems like I am not alone on this:

See here and here for example.

FreeNAS is unbelievably good for beeing a free software. But these hardware failures are worrying me. I also tried Unraid on a 3rd backup NAS with the same hardware just to check if things look better and lo and behold I have not lost any hardware whatsoever.

So I have to ask. What's going on with FreeNAS? How hard is it hammering the boot pool to kill drives that fast. Could there be a problem or is this just "working as intended"? This looks like a very broad issue.

Thank you for the awesome software! Any help is much appreciated.

~Franiac
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
No matter how much it writes it shuold not be able to kill the SSD by writes alone. So it's definitely something wrong here.

Looking at my VM with 11.3U5 it writes about 0.1MB/s or so. No way near enough to ever be a problem.

Do you have SMART info on how many TB the drive have been written to?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I have done a lot of research and there is nothing wrong with either my configuration or hardware in general.
I'm not coming to the defense of FreeNAS but this problem has been well documented within the forums for many years, especially the USB Flash drives since FreeNAS 9.3 was introduced using a ZFS live pool format. I have not heard of any SSD failures since FreeNAS 11.3 has come out but that does not mean there are none, it just means I have not been in touch with this forum as much as I use to be.

The developers have recently changed (I don't recall when they changed it but I believe it was in earlier 2020) how often the boot pool is written to in order to significantly lessen the impact with dying USB Flash drives so with the current version of FreeNAS I'd be somewhat surprised if there is a killer in the code now. When reporting problems like this people need to be very detailed in bug reporting to denote the exact configuration to ensure all things are being considered. For me if I were to test this out, I'd setup a test system, or even a production system and record the SMART data each day for the SSDs and place it in a spreadsheet to track the usage history. Record written data and wear level/life remaining of the SSD. This would prove to be valuable. I would also record temperature if it's reported and any other key thing you might want, read data could be useful but not from a wear perspective.

So while I think many of us understand that FreeNAS was killing the USB drives, I'm kind of surprised to hear about multiple failing SSDs on a system. Support the claims with some facts (actual usage data and detailed configuration data) and then submit a bug report. If there is a problem out there, trust me, we all want it identified and corrected, so does iXsystems.

And thanks for voicing your concern.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
You've definitely got something odd going on to be killing SSD's. I'd check the smart data as @no_connection suggested to check life on those drives. My system has been in place with the same SSD (16GB SATADOM) for over 4 1/2 years and it has 97% life left according to SMART data.

Is your system dataset on your boot pool?
 

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
@joeschmuck The thing is that S.M.A.R.T. never showed any significant indicators of why the drives would fail. If you install a fresh SSD and it dies after 4 weeks it cannot be because of bytes written. Temparatures never reaching 45°C or above.

I understand that without hard evidence officialy there is not a problem. Fact is, there are a lot of useres obviously having trouble with FreeNAS and SSD's dying. Otherwise you wouldn't find so many posts about it.

I have done a 1 year test with the exact same hardware and Unraid. And I mean the exact same hardware as I let Unraid run 12 months and then FreeNAS on the same machine. That's the year in my previous post when I switched from USB to SSD's. Unraid ran smoothly all the time. After re-installing FreeNAS to the exact same machine it took 3 weeks and the first SSD was gone. This is not "hard evidence", I know, but more than suspicous. And it gets worse when after that suddenly 2 more die within a year. I don't want to preach nonesense but I have a strong feeling that if I ran Unraid again, my SSD's would stop dying. And I don't want to say "Ehhh look, Unraid is so much better than FreeNAS" (which it is not), I just want to point out what I have learned.

FreeNAS versions used were 11.2 and later 11.3 (always upgrading when something new was out).

Hardware was:

ASRock J4105M (No ECC RAM, though)
LSI 9207-8i Controller (SSD's were hooked up to that, not the onboard)
6x 3TB WD Red's (CMR) in an encrypted RAIDZ2 pool

SSD's dead: 2x Crucial MX500 250GB and 1x WD Blue 120GB

I did not write down SMART data every day and ran charts and what not (which I regret right now, since you're asking) but I made sure nothing went above thresholds or indicated a potential source of the problem.

@Jailer my system dataset is on the boot pool, yes. I had an iX employee switch that from a HDD pool when we were tracking down a bug via TeamViewer. He said there should not be a problem since I run SSD's. But in the time before that when I was using USB, the system dataset was not on the boot pool and even then the sticks died like flies.

General question: Is there a nice way to log the results of all recent SMART tests without manually consultion smartctl every day?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
In the 3rd year when I switched to SSD's (Crucial MX500 and WD Blue) I lost 3 SSD's.

Well, something's got to be wrong. I use WD Blue and MX500's quite a bit as hypervisor datastores that should be seeing a lot more writes than your FreeNAS.

SSD write endurance is fun. I *have* managed to kill Intel 535 480GB's (only 73TBW write endurance on workloads that needed ~100-300TBW over five years), and also some SanDisk SSD Plus, but it is pretty consistently only because I was deliberately using a product in a way it wasn't intended.

SSD's dead: 2x Crucial MX500 250GB and 1x WD Blue 120GB

I don't recall there being a WD Blue 120GB. Do you have a part number for that?

The endurance of the Blue 250GB is listed as 100TBW, so a hypothetical 120GB drive might weigh in at around 50TBW. The endurance of the MX500 250GB is also 100TBW.

Three weeks life would imply several TB being written per day. I would think lots of people would have noticed.

The two links you provided above both appear to reference USB sticks, which are definitely problematic.
 

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
I don't recall there being a WD Blue 120GB

You're absolutely right. I was confusing it with a different system. The 120GB that died is a SanDisk Plus.

And regarding your TBW facts. I want to point out that I am only using the SSD's as boot drives, not as storage. So it is near to impossible that the boot pool of FreeNAS would reach anywhere near 50TBW or even 100TBW within a year. We are talking about a private NAS used for a home recording studio. Yes, studios produce big files but we are not talking about TB's a week and such.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You're absolutely right. I was confusing it with a different system. The 120GB that died is a SanDisk Plus.

And those are ... less enduring, I guess is the kind way to put it.

And regarding your TBW facts. I want to point out that I am only using the SSD's as boot drives, not as storage. So it is near to impossible that the boot pool of FreeNAS would reach anywhere near 50TBW or even 100TBW within a year. We are talking about a private NAS used for a home recording studio. Yes, studios produce big files but we are not talking about TB's a week and such.

Well, on one hand, you're correct about write volumes, and on the other hand, burning SSD's out is the usual thing that kills them, and you've effectively arrived with a conundrum.

As far as I'm aware, there's no generic kill code for SSD's other than to burn their endurance, and obviously FreeNAS wouldn't have it even if there was.

There are edge cases like it being possible to have power supply problems or bad connections. The nature of flash being what it is, I suppose it is possible that something like a noisy supply could sufficiently mess with things that it could kill flash.

But it really seems more likely that if flash from two manufacturers was successfully killed, there is likely to be some way in which a high level of writes was happening.
 

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
And those are ... less enduring, I guess is the kind way to put it.
Those are trash indeed. But it was the only thing laying around when a MX500 died.

At least it's some kind of emotional support that your are as confused as I am :)
And I know there's not some kind of instant kill switch to flash hardware. At least if it's not an Apple product :P

Fun aside: What 2.5" SATA SSD's would you recommend? To be real, the MX500 is also just another budget consumer product. A good one and a often used one but that does not mean anything. Samsung Pro? Intel?
 

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
Also, the system dataset documentation is something I did not really understand completely.
What is the difference between putting it on the boot pool or a separate hdd pool? There's not much info about what it really does and what the pros/cons are. As a normal user it can be confusing sometimes.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I have been using FreeNAS for over 3 years now and the death rate of my boot drives is very high. I have done some counting and want to present the results.
I started off by using USB sticks as boot drives for FreeNAS (started with 9.2) and later switched to SSD's since it is recommended nowadays to do so. Currently I am running 11.3U5. We are talking 2 years of USB stick and 1 year of SSD usage. And again, we are only talking about a boot drive mirror consisting of either 2 USB sticks or 2 SSD's.

~Franiac
Hi Franiac,
USB drives are not built for continuous writes, they are mostly designed for transferring data. The write endurance is getting worse, as TLC and now QLC flash is being used. These USB drives are often particularly bad at small writes and even worse at handling writes when the drive is full. For that reason, the hardware guides do not recommend USB drives except as a way of transferring the boot image. Regular SATA SSDs are much better and are designed for random writes.

That being said, earlier version of FreeNAS did write more that recent versions. I'm a little surprised that tings haven't improved after update to 11.3.

If the system datasets is on the USB drive, it may be getting significantly more writes if the system is larger or busy. Is this system used heavily? Can you move the system dataset to the data pool or do you need encryption on that pool?

For more info on system dataset see here: https://www.truenas.com/docs/hub/tasks/advanced/system-dataset/
 
Last edited:

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
FreeNAS killing USB is something that need fixing if at all possible, maybe some active wear leveling by leverage ZFS cow design and rotating the area written to.
I'm thinking that maybe FN does something weird like erasing or excessive TRIM. Has to be something happening.

What is the temperature of the SSD?
 

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
USB drives are not built for continuous writes, they are mostly designed for transferring data. The write endurance is getting worse, as TLC and now QLC flash is being used. These USB drives are often particularly bad at small writes and even worse at handling writes when the drive is full. For that reason, the hardware guides do not recommend USB drives except as a way of transferring the boot image. Regular SATA SSDs are much better and are designed for random writes.

Yes, I am not really bummed about the USB sticks being eaten alive. That's just the nature of the technology used, I understand that. But I am worried about my 3 SSD's that died within 12 months.

hat being said, earlier version of FreeNAS did write more that recent versions. I'm a little surprised that tings haven't improved after update to 11.3.

Currently I am on 11.3U5. The test started with 11.2U2.

If the system datasets is on the USB drive, it may be getting significantly more writes if the system is larger or busy. Is this system used heavily? Can you move the system dataset to the data pool or do you need encryption on that pool?

The NAS is file storage only, no Jails or VM's. Only services running are SMB for Windows shares and SSH, so basically the thing does nothing. And a NAS system that does nothing and thereby killing SSD's is really really confusing to me. We talk about maybe 2GB growth a WEEK.

All other pools are encrypted, so the system dataset has to be on the boot pool. Thx for the link btw. I have only looked up FreeNAS docs, not TrueNAS.

What is the temperature of the SSD?

Always below 45°C. NAS systems are in a 19" rack in the basement (air cooled / no AC / avg room temp 18°C)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
FreeNAS killing USB is something that need fixing if at all possible
Why? USB sticks haven't been recommended boot media since 9.3 was released, however many years ago that was, and for exactly that reason.

As to the SSDs, I don't know what's going on with OP. I'm using a SanDisk 120 GB (not mirrored) myself, and it's been in the system over five years without problems. Doesn't prove anything other than that the problem he's seeing isn't universal, but that was pretty obvious anyway.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Fun aside: What 2.5" SATA SSD's would you recommend? To be real, the MX500 is also just another budget consumer product. A good one and a often used one but that does not mean anything. Samsung Pro? Intel?

Well, I'd mostly been buying MX500 and WD Blue for hypervisor datastores for the last year or two, but this Black Friday, the 1TB 860 Evo's were going for $100 and that was hard to pass up (pointing to the stack of 18 boxes).

I generally have two tiers, one is "workload compatible with consumer SSD", for which those are the three basic options, though there's also some Intel 535 and 545s floating around there too (for SATA) or similar stuff like WD Black SN750 for NVMe. The heavy hitting stuff typically gets thrown on Intel data center SSD rated appropriate to the workload.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Yes, I am not really bummed about the USB sticks being eaten alive. That's just the nature of the technology used, I understand that. But I am worried about my 3 SSD's that died within 12 months.

It would be interesting to see the diagnostics from the Crucial MX500 and the WD Blue. These should report their "wear"... erase cycles or TBW.
It is unexpected that they would fail predictably. Please share that data on any failed drive.

There is a general issue with consumer level drives which is they format them for maximum capacity and use as a desktop/laptop as a file store, typically only 70% full. For example, they advertise 500GB capacity and not 480GB or 400GB. When they do this, they leave very little room for flash garbage collection. This in turn makes continuous writes (particularly small ones) very expensive on the SSD when it is full... to write 4KB, they may have to read and re-write another 50KB. This amplifies write and dramatically increase wear, but the SSDs should report that.

There is a method of overcoming this issue which is to deliberately format the drive for a lower capacity. This provides more room for garbage collection and can significantly reduce write amplification and wear. This is included in 11.3 as a shell command mostly fro overprovisioning SLOG devices.


In TrueNAS 12.0 it is in the UI

 

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
It would be interesting to see the diagnostics from the Crucial MX500 and the WD Blue. These should report their "wear"... erase cycles or TBW.
It is unexpected that they would fail predictably. Please share that data on any failed drive.
Like I wrot earlier, there is nothing TBW related that's off the scale or even in any usage or wear threashold altogether. SMART looked good for every drive. I don't have the exact logs of those, but we are talking about a failed drive after 3-4 weeks being used as a boot pool only drive on a private NAS. It does nothing. Maybe every 2 hours a 100MB file gets copied to it, that's it. And that would be "high usage" in the case of the machine. This machine really derps around 24hrs normally as a data archive for finished projects. Hell, the total bytes read could be probably measured in megabytes ;)

I think this is just a bad guessing game with no clues to hold onto.

There's one thing I could try to do, though. The machine was installed with FreeNAS 8 and updated ever since. Maybe I'll just order some brand new Intel Data Center SSD's and make a clean fresh install of 11.3 U5. At the moment I believe I need some kind of magic to happen in order to fix the problem.

I should be able to import the encrypted pools from the HDD's after a fresh install, right? Recovery keys and such... actually never made a fresh install and imported existing pools to be honest. Is there anything special to watch out for?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
FreeNAS 8 - Does that means the boot pool is not ZFS, but UFS? Any errors would not be correctable and there may be problems with power events.
 

Franiac

Dabbler
Joined
Jan 9, 2019
Messages
13
Save a copy of your config before you do a fresh install.
I do that before every update. But maybe it is unwise to load the config after the fresh install. Just to make sure that the potential cause of the problem does not get "loaded" again (very unlikely, though).

Does that means the boot pool is not ZFS, but UFS?
Sorry for the misunderstanding. As I wrote in my original post, I have been using FreeNAS for 3 years. Before a coworker (who actually knew what he was doing) did the job. I know we started off with 8 way back. The boot pool is definitely ZFS. So that means, it must have been re-installed at some point?

Anyway, enough of the guessing... I am going for the wipe.

I thank you all for your input and will to help me out!
 
Top