Andrew Ostrom
Explorer
- Joined
- Jul 28, 2017
- Messages
- 57
I recently built my first Freenas system. I started with a Supermicro chassis/server and went from there. I am having two issues, I will post two threads since they are completely different issues.
First - no matter how I do it i keep getting this message:
Here is my configuration:
Any ideas what's wrong and why this keeps going off line. The system is running fine on /dev/ada0, which is the identical drive.
First - no matter how I do it i keep getting this message:
The boot volume state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
Here is my configuration:
Supermicro SuperChassis 846E1-R1200B
X9DRi-F Motherboard
Dual Intel Xeon E5-2620 15M 2Ghz Six Core Processors
128GB ECC Ram (16x 8GB PC3-12800R)
LSI SAS-9207-8i
BPN-SAS2-846EL1 Backplane
Dual PWS-1K21P-1R 1200W 80 Gold Plus Power Supplies
2 x Sandisk SSD Plus 120GB SSD (boot pool)
8 x Seagate Constellation ES.3 3TB SAS RaidZ2
6 x Seagate Constellation ES.3 3TB SAS + 2 x Hitachi DT01ACA300 3TB RaidZ2
5 x Seagate Constellation ES.3 SAS + 3 Seagate IronWolf 4TB RaidZ2
When I installed Freenas I used the two SSDs in a mirrored config. Everything went fine and then a day (maybe less) later I got an alert that the boot pool was degraded because of an error on the /dev/ada1 SSD. SanDisk doesn't have a diagnostic for BSD, so I pulled the two drives, installed them in a Windows box and ran the SanDisk test utility. I also ran a variety of Windows-based tools to try to stress test the drives. The utilities , and SMART, show no errors. I reinstalled in the Unix box, reinstalled Freenas, and within a day it dropped /dev/ada1 again. I have tried multiple SATA cables, no change. Here is smartctl -x output:root@freenas[~]# smartctl -x /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: SanDisk SSD PLUS 120GB
Serial Number: 1839E6805015
LU WWN Device Id: 5 001b44 8b91babbe
Firmware Version: UE4500RL
User Capacity: 120,040,980,480 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Mar 25 01:44:57 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x15) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 21) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
5 Reallocated_Sector_Ct -O--CK 100 100 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 691
12 Power_Cycle_Count -O--CK 100 100 000 - 20
165 Unknown_Attribute -O--CK 100 100 000 - 5
166 Unknown_Attribute -O--CK 100 100 --- - 0
167 Unknown_Attribute -O--CK 100 100 --- - 0
168 Unknown_Attribute -O--CK 100 100 --- - 2
169 Unknown_Attribute -O--CK 100 100 --- - 80
170 Unknown_Attribute -O--CK 100 100 --- - 0
171 Unknown_Attribute -O--CK 100 100 000 - 0
172 Unknown_Attribute -O--CK 100 100 000 - 0
173 Unknown_Attribute -O--CK 100 100 000 - 0
174 Unknown_Attribute -O--CK 100 100 000 - 14
184 End-to-End_Error -O--CK 100 100 --- - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 --- - 0
194 Temperature_Celsius -O---K 068 041 000 - 32 (Min/Max 16/41)
199 UDMA_CRC_Error_Count -O--CK 100 100 --- - 0
230 Unknown_SSD_Attribute -O--CK 100 100 000 - 4294967297
232 Available_Reservd_Space PO--CK 100 100 005 - 100
233 Media_Wearout_Indicator -O--CK 100 100 --- - 0
234 Unknown_Attribute -O--CK 100 100 000 - 30
241 Total_LBAs_Written ----CK 100 100 000 - 9
242 Total_LBAs_Read ----CK 100 100 000 - 2
244 Unknown_Attribute -O--CK 000 100 --- - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 1 Comprehensive SMART error log
0x03 GPL R/O 16 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 1 Device vendor specific log
0xa2 GPL,SL VS 2 Device vendor specific log
0xa3 GPL,SL VS 1 Device vendor specific log
0xa7 GPL,SL VS 1 Device vendor specific log
0xa9 GPL,SL VS 3 Device vendor specific log
Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
SMART Extended Comprehensive Error Log Version: 1 (16 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 691 -
# 2 Short offline Completed without error 00% 92 -
Selective Self-tests/Logging not supported
SCT Commands not supported
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 20 --- Lifetime Power-On Resets
0x01 0x010 4 691 --- Power-on Hours
0x01 0x018 6 19207147 --- Logical Sectors Written
0x01 0x028 6 5788465 --- Logical Sectors Read
0x01 0x038 6 691 --- Date and Time TimeStamp
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 32 --- Current Temperature
0x05 0x010 1 32 --- Average Short Term Temperature
0x05 0x018 1 - --- Average Long Term Temperature
0x05 0x020 1 41 --- Highest Temperature
0x05 0x028 1 30 --- Lowest Temperature
0x05 0x030 1 34 --- Highest Average Short Term Temperature
0x05 0x038 1 34 --- Lowest Average Short Term Temperature
0x05 0x040 1 - --- Highest Average Long Term Temperature
0x05 0x048 1 - --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 95 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 0 N-- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 6 Device-to-host register FISes sent due to a COMRESET
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x0001 2 0 Command failed due to ICRC error
Any ideas what's wrong and why this keeps going off line. The system is running fine on /dev/ada0, which is the identical drive.