SSD Failures from ZFS pool

Jagan

Dabbler
Joined
Feb 11, 2019
Messages
12
Hi Guys,

I am facing a problem with my storage server. Please anyone can help me to fix the problem. We are using Free NAS 11.1U4 with 6no.s of Samsung 860 EVO SSD (ZFS-Raid5). Every week one ssd disk is getting fail from zfs pool , as of now I have replaced the failed disks 6 to 7 times.

I have a doubt that is there any compatibility issue with Samsung 860 EVO 1TB SSD with LSI 9207-8i HBA card & ZFS - raid5. Earlier when I was connected 3 850EVO SSDs to the board that time there was any issue. When I upgraded to HBA card 860 EVO SSD the problem is started.

Thanks in advance.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Welcome to the forums.

Have you ensured that your LSI HBA is running the latest version of the IT firmware (20.00.07.00) that can be downloaded?

https://www.broadcom.com/products/storage/host-bus-adapters/sas-9207-8i#downloads

Can you post the rest of your system specs (CPU, motherboard, RAM) - with the issue beginning with a new HBA and new drives, I can't help but think hardware. You may want to consider swapping the cables used to connect your drives (are you using SAS breakout?) or try connecting some of the 860 EVOs to the motherboard SATA ports and see if they function correctly.
 

Jagan

Dabbler
Joined
Feb 11, 2019
Messages
12
Welcome to the forums.

Have you ensured that your LSI HBA is running the latest version of the IT firmware (20.00.07.00) that can be downloaded?

https://www.broadcom.com/products/storage/host-bus-adapters/sas-9207-8i#downloads

Can you post the rest of your system specs (CPU, motherboard, RAM) - with the issue beginning with a new HBA and new drives, I can't help but think hardware. You may want to consider swapping the cables used to connect your drives (are you using SAS breakout?) or try connecting some of the 860 EVOs to the motherboard SATA ports and see if they function correctly.
Hi, thanks for your update.

I have changed the cables 2 to 3 times. And also have changed the HBA card 2times. But no luck. It has xeon E3 1246 V3 cpu, Asus p9dc-4l mb and Kingston 4x8gb DDR3 ecc ram.
Based on my previous experience with 850 Evo connected to on board, I am also thinking to skip the HBA card & connect all the disk's to board directly.
Before doing I am trying to check in this forum to understand is there any compatibility issue with 860 Evo, LSI HBA & zfs.
I am using latest HBA firmware , incase if I use older firmware I will get notification in web GUI.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
in case if I use older firmware I will get notification in web GUI.
no, they stopped doing that, you need to check.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
I have a doubt that is there any compatibility issue with Samsung 860 EVO 1TB SSD with LSI 9207-8i HBA card
The Samsung 850 EVO and Pro series don't report Deterministic read ZEROs after TRIM (rzat), which, effectively disables TRIM for LSI HBA connected SSDs. I don't know if a lack of TRIM could be the source of the problems you're seeing, but I thought I'd mention it. The LSI firmware requires the connected SSDs to support both Data Set Management TRIM and Deterministic read ZEROs after TRIM. Others have suggested that the rzat change was implemented to distinguish their enterprise line from their consumer drives.

https://www.broadcom.com/support/kn...map-support-for-lsi-hbas-and-raid-controllers
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
@Mlovelace , thanks for the input. Would you expect the Intel DC drives handle this?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
@Mlovelace , thanks for the input. Would you expect the Intel DC drives handle this?
I don't know of a freeBSD command to check the specific TRIM info, diskinfo seems quite sparse. In RHEL you could run 'hdparm -I /dev/sdX | grep TRIM' and see what is supported...

Code:
$ sudo hdparm -I /dev/sda | grep trim
           *    Data Set Management TRIM supported (limit 1 block)

$ sudo hdparm -I /dev/sdb | grep trim
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM


Perhaps you could boot up a live Linux image and check, unless someone else knows an equivalent freeBSD command.

Edit: As an aside, if you have available on-board SATA ports to connect the SSDs to then you don't have to worry about this issue.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Perhaps you could boot up a live Linux image and check, unless someone else knows an equivalent freeBSD command.
You can try sg_readcap /dev/daX but I had better luck with camcontrol identify bus:target

Code:
[root@badger] ~# camcontrol identify 2:0 | grep DSM
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              6
DSM - deterministic read       yes              zeroed


I would not expect the lack of RZAT to cause problems, rather it would disable TRIM from working; although if it is not functioning correctly through the HBA, this could be causing major corruption if it is trying to TRIM (or successfully TRIMming, but returning data rather than zeroes)

@Jagan you could test this by disabling TRIM entirely in FreeNAS, but I would not recommend running like this permanently as functional TRIM support is crucial to long-term performance and drive health.

Edit: As an aside, if you have available on-board SATA ports to connect the SSDs to then you don't have to worry about this issue.

Agreed that this will fix this, but if this is a consistent issue I'd like to see a bug filed and root cause found, even if the solution is "this combination is known to break, don't use it"
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Agreed that this will fix this, but if this is a consistent issue I'd like to see a bug filed and root cause found, even if the solution is "this combination is known to break, don't use it"
Being that the RZAT is a requirement in the LSI IT firmware, I would think anyone using a SSD that doesn't support it should be looking for alternatives; either motherboard SATA ports or supported SSDs. The lack of TRIM getting executed with those cards/SSDs will be a problem in the long run.
 
Joined
May 10, 2017
Messages
838
Might that be related to the OP's problem? The 850s were working without trim, now it should be working.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Being that the RZAT is a requirement in the LSI IT firmware, I would think anyone using a SSD that doesn't support it should be looking for alternatives; either motherboard SATA ports or supported SSDs. The lack of TRIM getting executed with those cards/SSDs will be a problem in the long run.

Right, but the behavior I would expect from an SSD that doesn't support RZAT on an LSI HBA with this driver would be "TRIM disabled/not available" rather than the potential case we have here of "TRIM enabled even though not properly supported, and breaking things as a result." But according to the post you've linked, the 860 EVO supports RZAT and should function correctly through the HBA ... but that doesn't mean that there isn't a driver/firmware/combo-bug somewhere that's causing issues.

@Jagan - when you get a failure, does the zpool status report READ/WRITE/CKSUM errors?
 
Joined
May 10, 2017
Messages
838
I know that LSI SAS2 HBAs like the 9211, 9207, etc have issues with trim not working with the Linux driver since some time ago, on any SSD, though AFAIK besides TRIM not working there are no other problems, TRIM still works with SAS3 HBAs like the 9300, but only on SSDs with deterministic TRIM, though this most likely has nothing to do with FreeBSD and the OP's problem.
 

Jagan

Dabbler
Joined
Feb 11, 2019
Messages
12
Right, but the behavior I would expect from an SSD that doesn't support RZAT on an LSI HBA with this driver would be "TRIM disabled/not available" rather than the potential case we have here of "TRIM enabled even though not properly supported, and breaking things as a result." But according to the post you've linked, the 860 EVO supports RZAT and should function correctly through the HBA ... but that doesn't mean that there isn't a driver/firmware/combo-bug somewhere that's causing issues.

@Jagan - when you get a failure, does the zpool status report READ/WRITE/CKSUM errors?
The Zpool status is not showing any errors , Its just show as degraded and one disk in the pool show as failed.
 

Jagan

Dabbler
Joined
Feb 11, 2019
Messages
12
You can try sg_readcap /dev/daX but I had better luck with camcontrol identify bus:target

Code:
[root@badger] ~# camcontrol identify 2:0 | grep DSM
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              6
DSM - deterministic read       yes              zeroed


I would not expect the lack of RZAT to cause problems, rather it would disable TRIM from working; although if it is not functioning correctly through the HBA, this could be causing major corruption if it is trying to TRIM (or successfully TRIMming, but returning data rather than zeroes)

@Jagan you could test this by disabling TRIM entirely in FreeNAS, but I would not recommend running like this permanently as functional TRIM support is crucial to long-term performance and drive health.



Agreed that this will fix this, but if this is a consistent issue I'd like to see a bug filed and root cause found, even if the solution is "this combination is known to break, don't use it"
I am thinking to connect the SSDs to on board (I have enough sata ports on my MB) & see. Earlier when I have used 850 EVO to board with out HBA, the NAS server was running without any disk failure for more than a year. After when I done the upgrade (6*860EVO + 9207-8i) every week one SSD is getting fail.
 

Jagan

Dabbler
Joined
Feb 11, 2019
Messages
12
Thank you all for your quick updates.
Also I want to understand about trim & rzat , can anyone share me the links.
 

Jagan

Dabbler
Joined
Feb 11, 2019
Messages
12
I want to give one update here. Last week I switched my NAS os to nas4free (ZFS Raid5) on the same hardware (6*860-EVO 1TB, LSI 9207-8i HBA, xeon E3 1246 V3 cpu, Asus p9dc-4l mb and Kingston 4x8gb DDR3 ecc ram ). Again yesterday one ssd got failed.
Check my attached logs for your information. Attachment can be open with Notepad++.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Not seeing an attachment on your last post, double-check/reupload please.

I have a feeling that TRIM is somehow at fault here. Nas4Free/XigmaNAS is also based on FreeBSD, which means it would share the same codebase/TRIM implementation. It's possible that ZFSonLinux would not have these faults, but it also currently lacks TRIM support entirely.

Info on TRIM and RZAT can be found at https://en.wikipedia.org/wiki/Trim_(computing) - the section about RZAT is under Hardware Support/ATA.

For the safety of your pool I'd suggest swapping your drives back to the onboard SATA ports.
 
Top