The boot volume state is DEGRADED

Andrew Ostrom

Explorer
Joined
Jul 28, 2017
Messages
57
I recently built my first Freenas system. I started with a Supermicro chassis/server and went from there. I am having two issues, I will post two threads since they are completely different issues.

First - no matter how I do it i keep getting this message:
The boot volume state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.​

Here is my configuration:

Supermicro SuperChassis 846E1-R1200B​
X9DRi-F Motherboard​
Dual Intel Xeon E5-2620 15M 2Ghz Six Core Processors​
128GB ECC Ram (16x 8GB PC3-12800R)​
LSI SAS-9207-8i​
BPN-SAS2-846EL1 Backplane​
Dual PWS-1K21P-1R 1200W 80 Gold Plus Power Supplies​
2 x Sandisk SSD Plus 120GB SSD (boot pool)​
8 x Seagate Constellation ES.3 3TB SAS RaidZ2​
6 x Seagate Constellation ES.3 3TB SAS + 2 x Hitachi DT01ACA300 3TB RaidZ2​
5 x Seagate Constellation ES.3 SAS + 3 Seagate IronWolf 4TB RaidZ2​
When I installed Freenas I used the two SSDs in a mirrored config. Everything went fine and then a day (maybe less) later I got an alert that the boot pool was degraded because of an error on the /dev/ada1 SSD. SanDisk doesn't have a diagnostic for BSD, so I pulled the two drives, installed them in a Windows box and ran the SanDisk test utility. I also ran a variety of Windows-based tools to try to stress test the drives. The utilities , and SMART, show no errors. I reinstalled in the Unix box, reinstalled Freenas, and within a day it dropped /dev/ada1 again. I have tried multiple SATA cables, no change. Here is smartctl -x output:

root@freenas[~]# smartctl -x /dev/ada1​
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)​
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===​
Device Model: SanDisk SSD PLUS 120GB​
Serial Number: 1839E6805015​
LU WWN Device Id: 5 001b44 8b91babbe​
Firmware Version: UE4500RL​
User Capacity: 120,040,980,480 bytes [120 GB]​
Sector Size: 512 bytes logical/physical​
Rotation Rate: Solid State Device​
Form Factor: 2.5 inches​
Device is: Not in smartctl database [for details use: -P showall]​
ATA Version is: ACS-2 T13/2015-D revision 3​
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)​
Local Time is: Mon Mar 25 01:44:57 2019 EDT​
SMART support is: Available - device has SMART capability.​
SMART support is: Enabled​
AAM feature is: Unavailable​
APM feature is: Disabled​
Rd look-ahead is: Enabled​
Write cache is: Enabled​
DSN feature is: Unavailable​
ATA Security is: Disabled, frozen [SEC2]​
Wt Cache Reorder: Unavailable​
=== START OF READ SMART DATA SECTION ===​
SMART overall-health self-assessment test result: PASSED​
General SMART Values:​
Offline data collection status: (0x00) Offline data collection activity​
was never started.​
Auto Offline Data Collection: Disabled.​
Self-test execution status: ( 0) The previous self-test routine completed​
without error or no self-test has ever​
been run.​
Total time to complete Offline​
data collection: ( 120) seconds.​
Offline data collection​
capabilities: (0x15) SMART execute Offline immediate.​
No Auto Offline data collection support.​
Abort Offline collection upon new​
command.​
No Offline surface scan supported.​
Self-test supported.​
No Conveyance Self-test supported.​
No Selective Self-test supported.​
SMART capabilities: (0x0003) Saves SMART data before entering​
power-saving mode.​
Supports SMART auto save timer.​
Error logging capability: (0x01) Error logging supported.​
General Purpose Logging supported.​
Short self-test routine​
recommended polling time: ( 2) minutes.​
Extended self-test routine​
recommended polling time: ( 21) minutes.​
SMART Attributes Data Structure revision number: 1​
Vendor Specific SMART Attributes with Thresholds:​
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE​
5 Reallocated_Sector_Ct -O--CK 100 100 000 - 0​
9 Power_On_Hours -O--CK 100 100 000 - 691​
12 Power_Cycle_Count -O--CK 100 100 000 - 20​
165 Unknown_Attribute -O--CK 100 100 000 - 5​
166 Unknown_Attribute -O--CK 100 100 --- - 0​
167 Unknown_Attribute -O--CK 100 100 --- - 0​
168 Unknown_Attribute -O--CK 100 100 --- - 2​
169 Unknown_Attribute -O--CK 100 100 --- - 80​
170 Unknown_Attribute -O--CK 100 100 --- - 0​
171 Unknown_Attribute -O--CK 100 100 000 - 0​
172 Unknown_Attribute -O--CK 100 100 000 - 0​
173 Unknown_Attribute -O--CK 100 100 000 - 0​
174 Unknown_Attribute -O--CK 100 100 000 - 14​
184 End-to-End_Error -O--CK 100 100 --- - 0​
187 Reported_Uncorrect -O--CK 100 100 000 - 0​
188 Command_Timeout -O--CK 100 100 --- - 0​
194 Temperature_Celsius -O---K 068 041 000 - 32 (Min/Max 16/41)​
199 UDMA_CRC_Error_Count -O--CK 100 100 --- - 0​
230 Unknown_SSD_Attribute -O--CK 100 100 000 - 4294967297​
232 Available_Reservd_Space PO--CK 100 100 005 - 100​
233 Media_Wearout_Indicator -O--CK 100 100 --- - 0​
234 Unknown_Attribute -O--CK 100 100 000 - 30​
241 Total_LBAs_Written ----CK 100 100 000 - 9​
242 Total_LBAs_Read ----CK 100 100 000 - 2​
244 Unknown_Attribute -O--CK 000 100 --- - 0​
||||||_ K auto-keep​
|||||__ C event count​
||||___ R error rate​
|||____ S speed/performance​
||_____ O updated online​
|______ P prefailure warning​
General Purpose Log Directory Version 1​
SMART Log Directory Version 1 [multi-sector log support]​
Address Access R/W Size Description​
0x00 GPL,SL R/O 1 Log Directory​
0x01 SL R/O 1 Summary SMART error log​
0x02 SL R/O 1 Comprehensive SMART error log​
0x03 GPL R/O 16 Ext. Comprehensive SMART error log​
0x04 GPL,SL R/O 8 Device Statistics log​
0x06 SL R/O 1 SMART self-test log​
0x07 GPL R/O 1 Extended self-test log​
0x10 GPL R/O 1 NCQ Command Error log​
0x11 GPL R/O 1 SATA Phy Event Counters log​
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log​
0x80-0x9f GPL,SL R/W 16 Host vendor specific log​
0xa1 GPL,SL VS 1 Device vendor specific log​
0xa2 GPL,SL VS 2 Device vendor specific log​
0xa3 GPL,SL VS 1 Device vendor specific log​
0xa7 GPL,SL VS 1 Device vendor specific log​
0xa9 GPL,SL VS 3 Device vendor specific log​
Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.​
SMART Extended Comprehensive Error Log Version: 1 (16 sectors)​
No Errors Logged​
SMART Extended Self-test Log Version: 1 (1 sectors)​
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error​
# 1 Extended offline Completed without error 00% 691 -​
# 2 Short offline Completed without error 00% 92 -​
Selective Self-tests/Logging not supported​
SCT Commands not supported​
Device Statistics (GP Log 0x04)​
Page Offset Size Value Flags Description​
0x01 ===== = = === == General Statistics (rev 1) ==​
0x01 0x008 4 20 --- Lifetime Power-On Resets​
0x01 0x010 4 691 --- Power-on Hours​
0x01 0x018 6 19207147 --- Logical Sectors Written​
0x01 0x028 6 5788465 --- Logical Sectors Read​
0x01 0x038 6 691 --- Date and Time TimeStamp​
0x05 ===== = = === == Temperature Statistics (rev 1) ==​
0x05 0x008 1 32 --- Current Temperature​
0x05 0x010 1 32 --- Average Short Term Temperature​
0x05 0x018 1 - --- Average Long Term Temperature​
0x05 0x020 1 41 --- Highest Temperature​
0x05 0x028 1 30 --- Lowest Temperature​
0x05 0x030 1 34 --- Highest Average Short Term Temperature​
0x05 0x038 1 34 --- Lowest Average Short Term Temperature​
0x05 0x040 1 - --- Highest Average Long Term Temperature​
0x05 0x048 1 - --- Lowest Average Long Term Temperature​
0x05 0x050 4 0 --- Time in Over-Temperature​
0x05 0x058 1 95 --- Specified Maximum Operating Temperature​
0x05 0x060 4 0 --- Time in Under-Temperature​
0x05 0x068 1 0 --- Specified Minimum Operating Temperature​
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==​
0x07 0x008 1 0 N-- Percentage Used Endurance Indicator​
|||_ C monitored condition met​
||__ D supports DSN​
|___ N normalized value​
Pending Defects log (GP Log 0x0c) not supported​
SATA Phy Event Counters (GP Log 0x11)​
ID Size Value Description​
0x0003 2 0 R_ERR response for device-to-host data FIS​
0x0004 2 0 R_ERR response for host-to-device data FIS​
0x0006 2 0 R_ERR response for device-to-host non-data FIS​
0x0007 2 0 R_ERR response for host-to-device non-data FIS​
0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy​
0x000a 2 6 Device-to-host register FISes sent due to a COMRESET​
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC​
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC​
0x0001 2 0 Command failed due to ICRC error​

Any ideas what's wrong and why this keeps going off line. The system is running fine on /dev/ada0, which is the identical drive.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Search the forum for "SanDisk SSD PLUS" and you'll find at least two threads covering the travails of others with those drives.
 
Top