FreeNAS installer + SanDisk SSD = checksum errors

anmnz

Patron
Joined
Feb 17, 2018
Messages
286
I think I am seeing a bad interaction specifically between the FreeNAS installer and the SanDisk "SSD Plus 120GB" SSD (a very popular model). Details and hardware specs below. I'd be very interested if anyone here has some insight on what might be going on.

Summary: After installing FreeNAS 11.x on this SSD, a scrub of the "freenas-boot" pool reports many (30-100) checksum errors. But if I fill the SSD with random data and scrub, no errors are reported. Also a long SMART test reports no problem. It's only the FreeNAS installer that seems to induce errors.

(I will buy a different boot SSD for my build, but I'm posting here because it seems like a strange problem and I'd like to understand it better.)

Some things I have tried, to eliminate obvious possible causes of the problem.
  • Run long SMART test. The test doesn't report any problem.
  • Install FreeNAS on a different SSD (Samsung 850 Evo 500GB), plugged in to exactly the same data and power connections. After the install on this SSD there are never any checksum errors.
  • Install FreeNAS on a different disk, create a single-disk pool containing just the SanDisk SSD, fill it with random data (using dd if=/dev/random to create many large files), scrub. No errors are reported.
  • Returned the SSD for a replacement. Installing on the new SSD produces checksum errors just like the old one. Same model, but quite different serial numbers (the first one started with "17", the replacement with "18") so doesn't seem like a "bad batch" problem.
  • Replace SATA cable, change motherboard SATA port (including moving between SATA-2 and SATA-3 ports), use a different power cable. Makes no difference, a fresh install on the SanDisk SSD always produces checksum errors while installing on the Samsung Evo SSD does not.
  • Tried multiple recent FreeNAS versions: 11.1-U4, 11.1-RELEASE, 11.0-RELEASE. Always the same result.
I've seen one report on these forums of what might be a similar problem with a SanDisk SSD (link) but nothing widespread.

I can run further tests if anyone has good ideas for tests that might reveal more. Thanks in advance for any insights!

Full list of hardware for this build (with appreciation for all the great information and advice about hardware choices on these forums, and in particular for @Chris Moore's excellent recommendations and eBay-fu -- thanks all!!)
  • Supermicro X9SRL-F
  • Xeon E5-2680 v2
  • 4x 16GB Hynix RDIMM (Supermicro branded, on the motherboard's compatibility list; tested for over 80 hours with memtest86)
  • 1x 120GB SanDisk SSD Plus (boot)
  • 1x 500GB Samsung 850 Evo (cattle)
  • 6x 4TB WD Red (RAIDZ2 bulk storage)
  • Seasonic Focus Plus Gold 650W
  • Fractal Design Define R5 case, Noctua NH-U12DX i4 cooler and 3x NF-A14 PWM case fans
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It could be that the installer is using the wrong set of quirks for the drive, which messes it up somewhat.

FWIW, I installed vanilla FreeBSD on such an SSD the other day and everything went fine.
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Have you tried setting up the boot disk for UEFI boot and also for Legacy (BIOS) boot. Do you get the same errors both ways?

Is there an upgrade for the firmware on that SSD? Could be some incompatibility with the SSD controller.
 

anmnz

Patron
Joined
Feb 17, 2018
Messages
286
It could be that the installer is using the wrong set of quirks for the drive, which messes it up somewhat.
Sounds plausible. I note dmesg has no "quirks=" message for this SSD, neither under the installer nor under the installed OS (but it has one for every other drive on the system). I'm not sure whether that's evidence for or against the idea though. :)

Have you tried setting up the boot disk for UEFI boot and also for Legacy (BIOS) boot. Do you get the same errors both ways?
Yes, same errors whether installing for UEFI boot or BIOS boot. (In case it's not obvious, I don't even need to boot the installed OS to see the checksum errors; just dropping to the installer shell after the install and running zpool scrub then zpool status shows the problem.)

I'll be replacing the SanDisk SSD with a Kingston one shortly.
 

capa

Dabbler
Joined
May 11, 2018
Messages
18
Please let us know how you fared with your Kingston. As you know I'm facing the same problem, and I'm now trying my third SSD (see this thread).
 

anmnz

Patron
Joined
Feb 17, 2018
Messages
286
No problems with the Kingston; FreeNAS 11.1-U4 installed fine and no errors reported after a scrub.

It's one of these: "Kingston Technology SA400S37/120G SSD A400 120 GB Solid State Drive (2.5 Inch SATA 3)" https://www.amazon.co.uk/dp/B01N6JQS8C
 

moelassus

Dabbler
Joined
May 15, 2018
Messages
34
Looks like I'm running into the exact same problem. Crap. Wish I had seen this before ordering these drives. What a pain in the ass.
 

anmnz

Patron
Joined
Feb 17, 2018
Messages
286
Looks like I'm running into the exact same problem.
Do you want to give us a rundown on your hardware specs and what happened exactly? More detail can only help!
 

moelassus

Dabbler
Joined
May 15, 2018
Messages
34
SuperMicro X11SSM-F based system with 64GB of ECC RAM. Boot volume was two SandDisk 32GB Cruz Fit mirrored. I ordered two of the 120GB SanDisk Plus SSDs. Upon receipt I plugged them in and ran the long SMART test on both drives. No errors were reported. I then replaced one USB with an SSD so that I had one USB drive mirrored with an SSD drive. Everything appeared to go just fine. Then I replaced the second USB with the second SSD. As soon as the volume came back online it showed 1 checksum error on the second SSD. I then ran a Scrub which came back with even more checksum errors.

Code:

 pool: freenas-boot
state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0 days 00:00:02 with 2 errors on Wed Jun 13 11:29:04 2018

config:

	NAME		 STATE	   READ WRITE CKSUM

	freenas-boot  ONLINE	   0	 0	 2
	  mirror-0   ONLINE		0	 0	 4
		ada0p2   ONLINE		0	 0	 5
		ada1p2   ONLINE		0	 0	 4

errors: Permanent errors have been detected in the following files:

		<metadata>:<0x3d>
		freenas-boot/ROOT/default:<0x0>


So I did a reinstallation to the two SSDs. Installation went flawlessly but when I forced a scrub, dozens of checksum errors. I tried a second pair of SATA cables with the same result. I'll be trying these two SSDs in my "backup" FreeNAS box this evening to see if an entirely different system will have the same results.

I'm back to using the two USB drives now and things are running perfectly again. Scrubbing the boot volume generates no errors.
 

moelassus

Dabbler
Joined
May 15, 2018
Messages
34
No problems with the Kingston; FreeNAS 11.1-U4 installed fine and no errors reported after a scrub.

It's one of these: "Kingston Technology SA400S37/120G SSD A400 120 GB Solid State Drive (2.5 Inch SATA 3)" https://www.amazon.co.uk/dp/B01N6JQS8C

Based on your success with this Kingston drive I just ordered a pair. They were only a little more expensive than the SanDisks and feature the added benefit of working. :)
 

capa

Dabbler
Joined
May 11, 2018
Messages
18

capa

Dabbler
Joined
May 11, 2018
Messages
18
Does any one of you still have the SANDisk SSD Plus 2.5'' laying around. Yes? Then could you please provide the following information to help track down the problem (this is what they ask for in the bug report)

Just a guess, there were some SSDs on the market with broken TRIM or NCQ TRIM implementations, corrupting the data. It would be useful to find out whether problem is reproducible with TRIM forcefully disabled with vfs.zfs.trim.enabled=0 loader tunable. If that help, I may need more detailed information about the SSDs (like `camcontrol identify /dev/ada0 -v`) to add them into exceptions list.

Thank you so much.
 

anmnz

Patron
Joined
Feb 17, 2018
Messages
286

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Looks like at least some of the affected drives have firmware UE3600RL flashed, whereas my working unit has firmware UE3000RL.
 

ThndrSS

Cadet
Joined
Aug 22, 2017
Messages
1
I am having an identical issue. I thought I was going crazy. FreeNAS installs on the SSD Plus 120, but after rebooting I start getting filesystem errors and the drive goes to a degraded state. I have two identical drives and they both do it. If I disable TRIM, do I do it in the install or post install process? Will this affect the other drives? (6x WD RED).
 

anmnz

Patron
Joined
Feb 17, 2018
Messages
286
I am having an identical issue. I thought I was going crazy. FreeNAS installs on the SSD Plus 120, but after rebooting I start getting filesystem errors and the drive goes to a degraded state. I have two identical drives and they both do it. If I disable TRIM, do I do it in the install or post install process? Will this affect the other drives? (6x WD RED).
I wouldn't disable TRIM. I would (did) just get a different SSD. The cost is very small compared to the whole system. Who knows what other ways these SSDs are putting your system at risk?

Hypothetically if you were to disable TRIM you could do it by editing boot commands in GRUB, both in the installer and at first boot of the installed system. Then you could set a tunable through the UI to set it for future boots.

TRIM is an operation on SSDs specifically. I would not expect the setting to have any effect on HDDs.
 

theomolenaar

Dabbler
Joined
Jun 12, 2016
Messages
43
What's the current recommendation for an SSD that has no errors? It seems Sandisk 120GB SSD Plus is not a good option right now. In an earlier post someone mentioned Kingston. Any other SSD that is guaranteed to work without errors?
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Crucial and Samsung are going to be your best, in terms of price and quality. Some high-end vendors that don't actually make their NAND, like Corsair, should also work well.
 

riktam

Dabbler
Joined
Dec 16, 2012
Messages
15
I have disks with firmware UE4500RL and the checksum errors problem still persist.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
Top