I've been running this DAS box for years now, and only after I upgraded to FreeNAS11 and now having problems (coincidence?).
I have since detached the volume for da11-da21 and also removed another volume with 8? 8TB disks which have also been having issues (I think staying connected or even seeing all the disks on the NAS, they worked fine until around the same time after writing ~8TB to the volume).
The main reason I suspect it's something with hardware is I added a few new 8TB drives after fully testing them and they were having issues around the same time the disks started to show errors. (both volumes were on the same DAS. Later today i will plug in the 8TB disks and update the thread)
The disks on the NAS are acting perfectly fine, i'm only having issues with my DAS.
FreeNAS-11-STABLE
DAS
SUPERMICRO 4U 846
Power board for JBOD
BPN-SAS2-846EL1 (I don't really have $300-400 at the moment to just throw it at another BPN to test/swap)
NAS
supermicro X8DTH
2x E5620
96GB ECC
LSI SAS2008
DAS is connected to NAS via Dell PERC H200E (Already swapped card and cables. could the FW need flashing?)
Here is a snippet of the SMART status
I have since detached the volume for da11-da21 and also removed another volume with 8? 8TB disks which have also been having issues (I think staying connected or even seeing all the disks on the NAS, they worked fine until around the same time after writing ~8TB to the volume).
The main reason I suspect it's something with hardware is I added a few new 8TB drives after fully testing them and they were having issues around the same time the disks started to show errors. (both volumes were on the same DAS. Later today i will plug in the 8TB disks and update the thread)
The disks on the NAS are acting perfectly fine, i'm only having issues with my DAS.
FreeNAS-11-STABLE
DAS
SUPERMICRO 4U 846
Power board for JBOD
BPN-SAS2-846EL1 (I don't really have $300-400 at the moment to just throw it at another BPN to test/swap)
NAS
supermicro X8DTH
2x E5620
96GB ECC
LSI SAS2008
DAS is connected to NAS via Dell PERC H200E (Already swapped card and cables. could the FW need flashing?)
Here is a snippet of the SMART status
Code:
+------+------------------+----+-----+-----+-----+-------+-------+--------+------+----------+------+-------+----+ |Device|Serial |Temp|Power|Start|Spin |ReAlloc|Current|Offline |Seek |Total |High |Command|Last| | |Number | |On |Stop |Retry|Sectors|Pending|Uncorrec|Errors|Seeks |Fly |Timeout|Test| | | | |Hours|Count|Count| |Sectors|Sectors | | |Writes|Count |Age | +------+------------------+----+-----+-----+-----+-------+-------+--------+------+----------+------+-------+----+ |da11 ?|YHGZW4JA | 38 |45186| 2183| 0| 0| 0| 0| 0| 247| N/A| N/A| 2| |da12 ?|YHGZW4RA | 39 |44330| 1953| 0| 0| 0| 0| 0| 196774| N/A| N/A| 2| |da13 ?|YHH26Z0A | 34 |44175| 2570| 0| 0| 0| 0| 0| 7405654| N/A| N/A| 6| |da14 ?|YVH40NGA | 35 | 229| 1146| 0| 0| 0| 0| 0| 16| N/A| N/A| 10| |da15 ?|YHH05LPA | 39 |45784| 2194| 0| 0| 0| 0| 0| 917702| N/A| N/A| 2| |da16 ?|YHH0BH7A | 39 |44363| 2434| 0| 0| 0| 0| 0| 2162897| N/A| N/A| 2| |da17 ?|YVHPGDTA | 38 |37175| 1919| 0| 0| 0| 0| 0| 112| N/A| N/A| 2| |da18 ?|YHH02BLA | 34 |44013| 1648| 0| 0| 0| 0| 0| 10420296| N/A| N/A| 2| |da19 ?|P8GJU02P | 35 | 9116| 1338|1638410| 0| 0| 0| 0| 4784132| N/A| N/A| 3| |da20 ?|YHH0A23A | 35 |44208| 2847| 0| 0| 0| 0| 0| 1114267| N/A| N/A| 8| |da21 ?|YHGZE4UA | 34 |44229| 3584| 0| 0| 0| 0| 0| 4128961| N/A| N/A| 5| ########## SMART status report for da18 drive (HITACHI HUA723030ALA640 : YHH02BLA) ########## SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. See vendor-specific Attribute list for failed Attributes. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 253 253 024 Pre-fail Always - 121 (Average 305) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 1648 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 062 062 067 Pre-fail Always FAILING_NOW 10420296 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 094 094 000 Old_age Always - 44013 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 68 192 Power-Off_Retract_Count 0x0032 098 098 000 Old_age Always - 3089 193 Load_Cycle_Count 0x0012 098 098 000 Old_age Always - 3089 194 Temperature_Celsius 0x0002 176 176 000 Old_age Always - 34 (Min/Max 16/60) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 ATA Error Count: 34 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 34 occurred at disk power-on lifetime: 44013 hours (1833 days + 21 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 00 9e 50 0d Error: UNC at LBA = 0x0d509e00 = 223387136 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 80 ff 3f 40 00 00:26:58.152 READ FPDMA QUEUED 60 00 08 80 01 40 40 00 00:26:58.151 READ FPDMA QUEUED 60 00 08 80 fe 3f 40 00 00:26:58.150 READ FPDMA QUEUED 60 00 08 80 fd 3f 40 00 00:26:58.147 READ FPDMA QUEUED 60 00 10 80 01 40 40 00 00:26:58.141 READ FPDMA QUEUED Error 33 occurred at disk power-on lifetime: 44013 hours (1833 days + 21 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 80 00 40 00 Error: UNC at LBA = 0x00400080 = 4194432 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 80 03 00 40 00 00:23:40.786 READ FPDMA QUEUED 60 00 00 00 03 00 40 00 00:23:40.786 READ FPDMA QUEUED 60 00 00 80 02 00 40 00 00:23:40.786 READ FPDMA QUEUED 60 00 00 00 02 00 40 00 00:23:40.785 READ FPDMA QUEUED 60 00 00 80 01 00 40 00 00:23:40.785 READ FPDMA QUEUED Error 32 occurred at disk power-on lifetime: 44012 hours (1833 days + 20 hours) When the command that caused the error occurred, the device was in standby mode. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 40 40 01 00 00 Error: UNC at LBA = 0x00000140 = 320 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 18 80 00 40 40 00 00:09:54.305 READ FPDMA QUEUED 60 00 10 80 00 40 40 00 00:09:51.586 READ FPDMA QUEUED 60 00 08 80 00 00 40 00 00:09:51.492 READ FPDMA QUEUED 60 00 00 00 00 00 40 00 00:09:51.476 READ FPDMA QUEUED 60 01 00 87 a3 50 40 00 00:09:50.741 READ FPDMA QUEUED Error 31 occurred at disk power-on lifetime: 44012 hours (1833 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 01 80 00 00 00 Error: UNC at LBA = 0x00000080 = 128 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 01 00 80 00 00 40 00 00:47:32.617 READ FPDMA QUEUED ec 00 00 00 00 00 00 00 00:47:32.615 IDENTIFY DEVICE ec 00 00 00 00 00 00 00 00:47:32.610 IDENTIFY DEVICE 60 01 00 80 00 00 40 00 00:47:32.599 READ FPDMA QUEUED b0 d5 01 01 4f c2 00 00 00:47:20.086 SMART READ LOG Error 30 occurred at disk power-on lifetime: 44012 hours (1833 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 01 7f 00 40 00 Error: UNC at LBA = 0x0040007f = 4194431 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 01 00 7f 00 40 40 00 00:46:02.178 READ FPDMA QUEUED ec 00 00 00 00 00 00 00 00:46:02.176 IDENTIFY DEVICE 60 01 00 7f 00 40 40 00 00:46:02.161 READ FPDMA QUEUED ec 00 00 00 00 00 00 00 00:46:02.159 IDENTIFY DEVICE 2f 00 01 10 00 00 00 00 00:46:02.076 READ LOG EXT Test_Description Status Remaining LifeTime(hours) LBA_of_first_error Short offline Completed without error 00% 43954 - ########## SMART status report for da21 drive (HITACHI HUA723030ALA640 : YHGZE4UA) ########## SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. See vendor-specific Attribute list for failed Attributes. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 1 2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 253 253 024 Pre-fail Always - 359 (Average 108) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 3584 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 061 061 067 Pre-fail Always FAILING_NOW 4128961 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 094 094 000 Old_age Always - 44229 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 74 192 Power-Off_Retract_Count 0x0032 096 096 000 Old_age Always - 4971 193 Load_Cycle_Count 0x0012 096 096 000 Old_age Always - 4971 194 Temperature_Celsius 0x0002 176 176 000 Old_age Always - 34 (Min/Max 16/58) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 ATA Error Count: 72 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 72 occurred at disk power-on lifetime: 44220 hours (1842 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 e0 a0 9e 50 0d Error: UNC at LBA = 0x0d509ea0 = 223387296 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 e0 18 a0 a0 50 40 00 1d+21:18:05.013 READ FPDMA QUEUED 60 e0 10 a0 9e 50 40 00 1d+21:18:05.013 READ FPDMA QUEUED 60 e0 08 a0 02 40 40 00 1d+21:18:05.012 READ FPDMA QUEUED 60 e0 00 a0 00 40 40 00 1d+21:18:05.012 READ FPDMA QUEUED 60 10 20 90 a0 50 40 00 1d+21:16:45.641 READ FPDMA QUEUED Error 71 occurred at disk power-on lifetime: 44220 hours (1842 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 80 00 04 40 00 Error: UNC at LBA = 0x00400400 = 4195328 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 10 80 a1 50 40 00 1d+21:16:33.156 READ FPDMA QUEUED 60 00 08 80 9f 50 40 00 1d+21:16:33.156 READ FPDMA QUEUED 60 00 00 80 03 40 40 00 1d+21:16:33.156 READ FPDMA QUEUED 60 b0 00 d0 01 40 40 00 1d+21:16:33.155 READ FPDMA QUEUED 60 08 48 c8 01 40 40 00 1d+21:16:33.141 READ FPDMA QUEUED Error 70 occurred at disk power-on lifetime: 44220 hours (1842 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 40 c0 00 00 00 Error: UNC at LBA = 0x000000c0 = 192 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 28 80 02 40 40 00 1d+21:11:55.992 READ FPDMA QUEUED 60 00 20 80 01 00 40 00 1d+21:11:55.992 READ FPDMA QUEUED 60 00 18 80 01 00 40 00 1d+21:11:55.992 READ FPDMA QUEUED 60 00 10 80 00 40 40 00 1d+21:11:55.992 READ FPDMA QUEUED 60 00 08 80 00 00 40 00 1d+21:11:55.992 READ FPDMA QUEUED Error 69 occurred at disk power-on lifetime: 44220 hours (1842 days + 12 hours) When the command that caused the error occurred, the device was in standby mode. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 80 00 40 00 Error: UNC at LBA = 0x00400080 = 4194432 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 28 80 01 00 40 00 1d+21:11:47.657 READ FPDMA QUEUED 60 00 08 80 01 00 40 00 1d+21:11:47.656 READ FPDMA QUEUED 60 00 00 00 00 00 40 00 1d+21:11:47.655 READ FPDMA QUEUED 60 00 20 80 00 00 40 00 1d+21:11:47.651 READ FPDMA QUEUED 60 00 18 80 00 40 40 00 1d+21:11:47.651 READ FPDMA QUEUED Error 68 occurred at disk power-on lifetime: 44180 hours (1840 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 01 af a3 50 0d Error: UNC at LBA = 0x0d50a3af = 223388591 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 01 00 af a3 50 40 00 05:09:01.855 READ FPDMA QUEUED ec 00 00 00 00 00 00 00 05:08:41.550 IDENTIFY DEVICE ef 10 02 00 00 00 00 00 05:08:41.302 SET FEATURES [Enable SATA feature] ef 02 00 00 00 00 00 00 05:08:41.302 SET FEATURES [Enable write cache] ef aa 00 00 00 00 00 00 05:08:41.302 SET FEATURES [Enable read look-ahead] Test_Description Status Remaining LifeTime(hours) LBA_of_first_error Short offline Completed without error 00% 44103 -
Last edited: