mloiterman
Dabbler
- Joined
- Jan 30, 2013
- Messages
- 45
- motherboard make and model
- SuperMicro X11SSH-CTF
- Firmware Revision : 01.48
- Firmware Build Time : 06/22/2018
- BIOS Version: 2.2
- BIOS Build Time: 05/23/2018
- Redfish Version : 1.0.1
- SuperMicro X11SSH-CTF
- CPU make and model
- CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz (3600.18-MHz K8-class CPU)
- Origin="GenuineIntel" Id=0x506e3 Family=0x6 Model=0x5e Stepping=3
- CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz (3600.18-MHz K8-class CPU)
- RAM quantity
- 32 GiB
- Crucial 2-16GB DDR4-2400 EUDIMM 1.2V CL17
- 32 GiB
- boot drive
- Intel SSD 600p Series SSDPEKKW128G7X1 (128 GB, M.2 80mm PCIe NVMe 3.0 x4, 3D1, TLC)
- hard drives, quantity, model numbers, and RAID configuration
- 8 x ST4000LM024
- RAIDZ2
- hard disk controllers
- Avago Technologies (LSI) SAS3008
- Code:
Avago Technologies SAS3 Flash UtilityVersion 16.00.00.00 (2017.05.02) Copyright 2008-2017 Avago Technologies. All rights reserved. Adapter Selected is a Avago SAS: SAS3008(C0) Controller Number : 0 Controller : SAS3008(C0) PCI Address : 00:01:00:00 SAS Address : 5003048-0-1e04-6000 NVDATA Version (Default) : 0e.00.20.00 NVDATA Version (Persistent) : 0e.00.20.00 Firmware Product ID : 0x2221 (IT) Firmware Version : 15.00.03.00 NVDATA Vendor : LSI NVDATA Product ID : LSI3008-IT BIOS Version : 08.35.00.00 UEFI BSD Version : 17.00.00.00 FCODE Version : N/A Board Name : LSI3008-IT Board Assembly : N/A Board Tracer Number : N/A
- Avago Technologies (LSI) SAS3008
- network cards
- ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xd0200000-0xd03fffff,0xd0404000-0xd0407fff irq 16 at device 0.0 on pci4
- ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xd0000000-0xd01fffff,0xd0400000-0xd0403fff irq 17 at device 0.1 on pci4
- FreeNAS-11.2-RELEASE (Build Date: Dec 5, 2018 21:28)
On 12/26/18 I received this error:
Code:
New alerts: * The volume tank state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. * Device: /dev/da7 [SAT], failed to read SMART Attribute Data
This was in the kernel log:
Code:
(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 776 Aborting command 0xfffffe0001094b80mpr0: Sending reset from mprsas_send_abort for target ID 7(pass7:mpr0:0:7:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 626 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0mpr0: Unfreezing devq for target ID 7(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da7:mpr0:0:7:0): CAM status: Command timeout(da7:mpr0:0:7:0): Retrying command(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da7:mpr0:0:7:0): CAM status: SCSI Status Error(da7:mpr0:0:7:0): SCSI status: Check Condition(da7:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)(da7:mpr0:0:7:0): Error 6, Retries exhausted(da7:mpr0:0:7:0): Invalidating pack
I replaced the drive and resilvered without any errors. The was completed last night and I thought all was OK.
This morning, I received this error:
Code:
New alerts: * Device: /dev/da6 [SAT], failed to read SMART Attribute Data
This was in the kernel log:
Code:
pid 2232 (syslog-ng), uid 0: exited on signal 6 (core dumped)(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 449 Aborting command 0xfffffe0001077570mpr0: Sending reset from mprsas_send_abort for target ID 6(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 b0 00 00 00 80 00 00 length 65536 SMID 978 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 a8 00 00 00 80 00 00 length 65536 SMID 316 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 b0 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 28 00 00 00 80 00 00 length 65536 SMID 729 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an erroroginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 20 00 00 00 80 00 00 length 65536 SMID 930 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying commandoginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 a8 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 a0 00 00 00 80 00 00 length 65536 SMID 886 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an erroroginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 18 00 00 00 80 00 00 length 65536 SMID 802 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 28 00 00 00 80 00 00oginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 20 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 98 00 00 00 80 00 00 length 65536 SMID 558 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an erroroginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 10 00 00 00 80 00 00 length 65536 SMID 588 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 a0 00 00 00 80 00 00oginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 18 00 00 00 80 00 00(pass6:mpr0:0:6:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 480 te(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 98 00 00 00 80 00 00rminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 10 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 08 00 00 00 08 00 00 length 4096 SMID 778 terminated ioc 804b lo(da6:mpr0:0:6:0): CAM status: CCB request completed with an errorginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4e 08 00 00 01 00 00 00 length 131072 SMID 765 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 08 00 00 00 08 00 00loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4d 48 00 00 00 c0 00 00 length 98304 SMID 867 terminated ioc 804b l(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4e 08 00 00 01 00 00 00oginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4c 48 00 00 01 00 00 00 length 131072 SMID 905 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an errorloginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0: (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4b 48 00 00 01 00 00 00 length 131072 SMID 671 terminated ioc 804b 6:loginfo 31130000 scsi 0 state c xfer 00): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4a 48 00 00 01 00 00 00 length 131072 SMID 596 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4d 48 00 00 00 c0 00 00loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4c 48 00 00 01 00 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 49 48 00 00 01 00 00 00 length 131072 SMID 731 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an errorloginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 48 48 00 00 01 00 00 00 length 131072 SMID 293 terminated ioc 804b (da6:loginfo 31130000 scsi 0 state c xfer 0mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 47 48 00 00 01 00 00 00 length 131072 SMID 805 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4b 48 00 00 01 00 00 00loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4a 48 00 00 01 00 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 46 48 00 00 01 00 00 00 length 131072 SMID 186 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an errorloginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 58 ac 00 70 00 00 00 f8 00 00 length 126976 SMID 983 terminated ioc 804b (da6:loginfo 31130000 scsi 0 state c xfer 0mpr0: Unfreezing devq for target ID 6mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 49 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 48 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 47 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 46 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 58 ac 00 70 00 00 00 f8 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da6:mpr0:0:6:0): CAM status: Command timeout(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da6:mpr0:0:6:0): CAM status: SCSI Status Error(da6:mpr0:0:6:0): SCSI status: Check Condition(da6:mpr0:0:6:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)(da6:mpr0:0:6:0): Error 6, Retries exhausted(da6:mpr0:0:6:0): Invalidating pack
I've already ordered a replacement drive which will be here on Thursday, just in case. But what is going on here? Is drive /dev/da6 dead due to stress of the resilver? Is it just a coincidence? Something else?
For reference:
Code:
root@marshall:~ # zpool status
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:11 with 0 errors on Wed Dec 26 03:45:11 2018
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
nvd0p2 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: resilvered 1.53T in 3 days 03:11:16 with 0 errors on Sun Dec 30 18:05:28 2018
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/30805cfa-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0
gptid/317926db-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0
gptid/32708da7-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0
gptid/8892b717-1998-11e7-96f0-0cc47ac56608 ONLINE 0 0 0
raidz2-1 DEGRADED 0 0 0
gptid/3478907c-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0
gptid/358a11bf-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0
gptid/febc4ffd-2f92-11e7-8a04-0cc47ac56608 FAULTED 6 118 0 too many errors
gptid/8d602633-0a19-11e9-be92-0cc47ac56608 ONLINE 0 0 0
errors: No known data errors
Code:
root@marshall:~ # smartctl -a /dev/da6
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 2.5 5400
Device Model: ST4000LM024-2AN17V
Serial Number: WCK0K9GW
LU WWN Device Id: 5 000c50 0a8fcf1dc
Firmware Version: 0001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5526 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Dec 31 11:23:17 2018 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 652) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30a5) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 082 064 006 Pre-fail Always - 152689043
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 27
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 092 060 045 Pre-fail Always - 1540777129
9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14572 (48 233 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 27
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 076 070 040 Old_age Always - 24 (Min/Max 22/28)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 52
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 143
194 Temperature_Celsius 0x0022 024 040 000 Old_age Always - 24 (0 21 0 0 0)
195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age Always - 152689043
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 14571 (180 15 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 15977107520
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 128477063110
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 14569 -
# 2 Short offline Completed without error 00% 14564 -
# 3 Short offline Completed without error 00% 8156 -
# 4 Extended offline Completed without error 00% 8002 -
# 5 Short offline Completed without error 00% 7988 -
# 6 Short offline Completed without error 00% 7916 -
# 7 Short offline Completed without error 00% 7748 -
# 8 Short offline Completed without error 00% 7580 -
# 9 Short offline Completed without error 00% 7413 -
#10 Extended offline Completed without error 00% 7258 -
#11 Short offline Completed without error 00% 7245 -
#12 Short offline Completed without error 00% 7077 -
#13 Short offline Completed without error 00% 6909 -
#14 Short offline Completed without error 00% 6742 -
#15 Extended offline Completed without error 00% 6587 -
#16 Short offline Completed without error 00% 6574 -
#17 Short offline Completed without error 00% 6502 -
#18 Short offline Completed without error 00% 6334 -
#19 Short offline Completed without error 00% 6166 -
#20 Short offline Completed without error 00% 5998 -
#21 Extended offline Completed without error 00% 5843 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Last edited: