Hi,
I think I know what the problem is.
I had 2x 5 Tb raidz1 initially. One of my disks is an WD, different from the other 4. I took it out because it was showing errors.
Now, today when I was checking my disks, one of the segate ( from those remaining 4) was missing. strange!!! Freenas was still running!!! With 2 missing disks in raidz1.
After checking the mobo and the cables it seems that particular sata port on the motherboard is dead! Fortunately my disk runs well.
I reattached the faulty Wd drive and after several reboots my volume got re-imported and seems healthy.
Code:
Ratchet# zpool status Storage -v
cannot open '-v': name must begin with a letter
pool: Storage
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Dec 28 18:05:40 2017
281G scanned at 359M/s, 186G issued at 238M/s, 4.46T total
37.0G resilvered, 4.06% done, 0 days 05:14:41 to go
config:
NAME STATE READ WRITE CKSUM
Storage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/c29cc428-d6a7-11e7-b817-7085c2504b54 ONLINE 0 0 1 (resilvering)
gptid/c3d4928f-d6a7-11e7-b817-7085c2504b54 ONLINE 0 0 0
gptid/c5027f97-d6a7-11e7-b817-7085c2504b54 ONLINE 0 0 0
gptid/c63fc1ff-d6a7-11e7-b817-7085c2504b54 ONLINE 0 0 0
gptid/c747f87f-d6a7-11e7-b817-7085c2504b54 ONLINE 0 0 0
Now it does resilvering.
Ratchet# camcontrol devlist
<TS64GMTS600 O0918B> at scbus0 target 0 lun 0 (ada0,pass0)
<ST2000DL003-9VT166 CC3C> at scbus1 target 0 lun 0 (ada1,pass1)
<ST2000DL003-9VT166 CC3C> at scbus2 target 0 lun 0 (ada2,pass2)
<ST2000DL003-9VT166 CC3C> at scbus3 target 0 lun 0 (ada3,pass3)
<WDC WD2001FASS-00W2B0 05.01D05> at scbus4 target 0 lun 0 (ada4,pass4)
<ST2000DL003-9VT166 CC3C> at scbus5 target 0 lun 0 (ada5,pass5)
<Kingston DT microDuo 3.0 PMAP> at scbus7 target 0 lun 0 (pass6,da0)
Ratchet# gpart show
=> 40 30277552 da0 GPT (14G)
40 8184 - free - (4.0M)
8224 196608 1 efi (96M)
204832 30072752 2 freebsd-zfs (14G)
30277584 8 - free - (4.0K)
=> 40 125045344 ada0 GPT (60G)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 120850944 2 freebsd-zfs (58G)
125045376 8 - free - (4.0K)
=> 40 3907029088 ada1 GPT (1.8T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834688 2 freebsd-zfs (1.8T)
3907029120 8 - free - (4.0K)
=> 40 3907029088 ada2 GPT (1.8T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834688 2 freebsd-zfs (1.8T)
3907029120 8 - free - (4.0K)
=> 40 3907029088 ada3 GPT (1.8T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834688 2 freebsd-zfs (1.8T)
3907029120 8 - free - (4.0K)
=> 40 3907029088 ada4 GPT (1.8T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834688 2 freebsd-zfs (1.8T)
3907029120 8 - free - (4.0K)
=> 40 3907029088 ada5 GPT (1.8T)
40 88 - free - (44K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834688 2 freebsd-zfs (1.8T)
3907029120 8 - free - (4.0K)
It seems those 2 disks are the faulty one:
Code:
smartctl -a /dev/ada4
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Black
Device Model: WDC WD2001FASS-00W2B0
Serial Number: WD-WMAY00564299
LU WWN Device Id: 5 0014ee 00282d3db
Firmware Version: 05.01D05
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Thu Dec 28 19:04:22 2017 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (29460) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 300) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3037) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 5
3 Spin_Up_Time 0x0027 202 165 021 Pre-fail Always - 11875
4 Start_Stop_Count 0x0032 094 094 000 Old_age Always - 6098
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 1
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3148
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 444
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 289
193 Load_Cycle_Count 0x0032 190 190 000 Old_age Always - 30520
194 Temperature_Celsius 0x0022 111 087 000 Old_age Always - 41
196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 39
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 43
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3136 -
# 2 Short offline Completed without error 00% 3124 -
# 3 Short offline Completed without error 00% 3113 -
# 4 Short offline Completed without error 00% 3101 -
# 5 Short offline Completed without error 00% 3092 -
# 6 Short offline Completed without error 00% 3077 -
# 7 Short offline Completed without error 00% 3066 -
# 8 Short offline Completed without error 00% 3054 -
# 9 Short offline Completed without error 00% 3042 -
#10 Short offline Completed without error 00% 3031 -
#11 Short offline Completed without error 00% 3019 -
#12 Short offline Completed without error 00% 3007 -
#13 Short offline Completed without error 00% 2998 -
#14 Short offline Completed without error 00% 2984 -
#15 Short offline Completed without error 00% 2973 -
#16 Short offline Completed without error 00% 2961 -
#17 Short offline Completed without error 00% 2950 -
#18 Short offline Completed without error 00% 2940 -
#19 Short offline Completed without error 00% 2926 -
#20 Short offline Completed without error 00% 2915 -
#21 Short offline Completed without error 00% 2902 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
-------------------------------
Code:
smartctl -a /dev/ada5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda Green (AF)
Device Model: ST2000DL003-9VT166
Serial Number: 5YD6M1JE
LU WWN Device Id: 5 000c50 04644f111
Firmware Version: CC3C
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Dec 28 19:03:12 2017 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 602) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 326) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30b7) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 099 006 Pre-fail Always - 2957232
3 Spin_Up_Time 0x0003 093 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 846
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 080 060 030 Pre-fail Always - 107047584
9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 15335
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 822
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 047 045 Old_age Always - 36 (Min/Max 36/36)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 726
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 837
194 Temperature_Celsius 0x0022 036 053 000 Old_age Always - 36 (0 19 0 0 0)
195 Hardware_ECC_Recovered 0x001a 004 003 000 Old_age Always - 2957232
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 14858 (200 40 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 561319630
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3428426054
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.