HP SSD Degraded Out Of Box

Status
Not open for further replies.

Chris McDowell

Dabbler
Joined
Feb 3, 2017
Messages
17
I got an HP SSD to replace some USB thumb drives that were degrading. Out of the box it starts degrading my boot volume . I purchased this one from NewEgg thinking it was fairly respectable brand. https://www.newegg.com/Product/Product.aspx?Item=N82E16820326780 It lists a Marvell 88NV1120 controller which I can't find much info on for support. Did I get a faulty drive or is there a compatibility issue with this controller?

I installed a fresh copy of FreeNas 11.2 RC-1 onto this drive and it would not boot with either Legacy or UEFI so I booted off a USB and mirrored to this drive but am having the degradation issues now. I swapped the USB for a HDD which is running fine except for the small amount of corruption while running just the SSD and switching the USB for the HDD.

Code:
  pool: freenas-boot

 state: DEGRADED

status: One or more devices has experienced an error resulting in data

	corruption.  Applications may be affected.

action: Restore the file in question if possible.  Otherwise restore the

	entire pool from backup.

   see: http://illumos.org/msg/ZFS-8000-8A

  scan: resilvered 785M in 0 days 00:00:21 with 7 errors on Sun Oct 28 19:22:15 2018

config:


	NAME		STATE	 READ WRITE CKSUM

	freenas-boot  DEGRADED	 0	 0	 7

	  mirror-0  DEGRADED	 0	 0	14

		ada1p2  DEGRADED	 0	 0   971  too many errors

		ada0p2  ONLINE	   0	 0	14  block size: 512B configured, 4096B native


errors: Permanent errors have been detected in the following files:


		<metadata>:<0x25>

		<metadata>:<0x26>

		<metadata>:<0x27>
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
You may want to check the smart status on the drive. My guess it that it's a bad drive.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It certainly looks like the 3D NAND issue (does the drive advertise 3D NAND as a feature?)

It does, from the NewEgg page:

New 3D NAND Technology

I thought it was a controller-based thing since the previous controllers were all based on the Silicon Motion SM2246/SM2256 - but there's someone in one of the linked threads saying that they have a drive with that controller but different memory.

https://forums.freenas.org/index.ph...t-disk-zfs-checksum-errors.64321/#post-471831

This would be the first case of a Marvell controller failing in a similar manner. Perhaps disabling TRIM would work here?
 

Chris McDowell

Dabbler
Joined
Feb 3, 2017
Messages
17
It is 3D NAND. Smart looks good I think.

I will try disabling trim and see if I can narrow it down.

Code:
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===

Device Model:	 HP SSD S600 120GB

Serial Number:	HBSA18291701597

LU WWN Device Id: 5 02b2a2 01d1c1b1a

Add. Product Id:  mavlsata

Firmware Version: HC0719C1

User Capacity:	120,034,123,776 bytes [120 GB]

Sector Size:	  512 bytes logical/physical

Rotation Rate:	Solid State Device

Device is:		Not in smartctl database [for details use: -P showall]

ATA Version is:   ACS-3 T13/2161-D revision 4

SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is:	Wed Oct 31 15:52:14 2018 PDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x00)	Offline data collection activity

					was never started.

					Auto Offline Data Collection: Disabled.

Self-test execution status:	  (   0)	The previous self-test routine completed

					without error or no self-test has ever 

					been run.

Total time to complete Offline 

data collection:		 (	0) seconds.

Offline data collection

capabilities:			 (0x51) SMART execute Offline immediate.

					No Auto Offline data collection support.

					Suspend Offline collection upon new

					command.

					No Offline surface scan supported.

					Self-test supported.

					No Conveyance Self-test supported.

					Selective Self-test supported.

SMART capabilities:			(0x0002)	Does not save SMART data before

					entering power-saving mode.

					Supports SMART auto save timer.

Error logging capability:		(0x01)	Error logging supported.

					General Purpose Logging supported.

Short self-test routine 

recommended polling time:	 (   2) minutes.

Extended self-test routine

recommended polling time:	 (   5) minutes.


SMART Attributes Data Structure revision number: 5

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate	 0x002e   100   100   050	Old_age   Always	   -	   0

  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0

  9 Power_On_Hours		  0x0032   069   100   000	Old_age   Always	   -	   72

 12 Power_Cycle_Count	   0x0032   001   100   000	Old_age   Always	   -	   21

171 Unknown_Attribute	   0x0032   100   100   010	Old_age   Always	   -	   0

172 Unknown_Attribute	   0x0032   100   100   010	Old_age   Always	   -	   0

174 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   15

183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0

187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0

194 Temperature_Celsius	 0x0022   037   100   000	Old_age   Always	   -	   37

198 Offline_Uncorrectable   0x0032   100   100   000	Old_age   Always	   -	   0

199 UDMA_CRC_Error_Count	0x0030   100   100   000	Old_age   Offline	  -	   0

241 Total_LBAs_Written	  0x0032   100   100   000	Old_age   Always	   -	   132

242 Total_LBAs_Read		 0x0032   100   100   000	Old_age   Always	   -	   8


SMART Error Log not supported


SMART Self-test log structure revision number 1


SMART Selective self-test log data structure revision number 0

Note: revision number not 1 implies that no selective self-test has ever been run

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

	1		0		0  Not_testing

	2		0		0  Not_testing

	3		0		0  Not_testing

	4		0		0  Not_testing

	5		0		0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Chris McDowell

Dabbler
Joined
Feb 3, 2017
Messages
17
Disabling TRIM seems to have gotten rid of the issues. Any idea if this is a bug they are planning to fix? I'll let you guys know if it degrades down the road.

Code:
sudo zpool status -v freenas-boot

Password:

  pool: freenas-boot

 state: ONLINE

status: One or more devices are configured to use a non-native block size.

	Expect reduced performance.

action: Replace affected devices with devices that support the

	configured block size, or migrate data to a properly configured

	pool.

  scan: scrub repaired 0 in 0 days 00:00:08 with 0 errors on Wed Oct 31 17:46:13 2018

config:


	NAME		STATE	 READ WRITE CKSUM

	freenas-boot  ONLINE	   0	 0	 0

	  mirror-0  ONLINE	   0	 0	 0

		ada0p2  ONLINE	   0	 0	 0  block size: 512B configured, 4096B native

		ada1p2  ONLINE	   0	 0	 0


errors: No known data errors
 
Joined
May 10, 2017
Messages
838
Disabling TRIM seems to have gotten rid of the issues.

This suggests to me that the type of NAND used isn't the reason for these issues, it could be on how the SSD handles trim, there are different types, non deterministic TRIM, DRAT (deterministic read after trim) or RZAT (read zeros after trim), the latter two mean the contents of the trimmed sectors should always be 0 after a trim operation, if for example the SSD reports DRAT or RZAT but due to a firmware bug isn't reading all zeros after a trim there would be checksum errors.

Can you please check what type your SSD is with: camcontrol identify /dev/daX | grep -i DSM (or camcontrol identify /dev/adaX | grep -i DSM)
 

Chris McDowell

Dabbler
Joined
Feb 3, 2017
Messages
17
This suggests to me that the type of NAND used isn't the reason for these issues, it could be on how the SSD handles trim, there are different types, non deterministic TRIM, DRAT (deterministic read after trim) or RZAT (read zeros after trim), the latter two mean the contents of the trimmed sectors should always be 0 after a trim operation, if for example the SSD reports DRAT or RZAT but due to a firmware bug isn't reading all zeros after a trim there would be checksum errors.

Can you please check what type your SSD is with: camcontrol identify /dev/daX | grep -i DSM (or camcontrol identify /dev/adaX | grep -i DSM)


% sudo camcontrol identify /dev/ada0 | grep -i DSM

Password:

Data Set Management (DSM/TRIM) yes

DSM - max 512byte blocks yes 8

DSM - deterministic read yes zeroed
 
Status
Not open for further replies.
Top