CRITICAL: The volume Main_server (ZFS) status is DEGRADED

Status
Not open for further replies.

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
Help! I am getting this Alert although there is no other discernible symptoms. The drive and system is working normally except for this Alert.

I have four drives in total. The 2TB's (Main_server) are mirror-0. The other two are on their own, ie., no RAID. The other two drives/pools are fine: no alerts, operating normally.

Zpool status:

pool: Main_server
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 4h4m with 0 errors on Sat Oct 24 21:38:39 2015
config:

NAME STATE READ WRITE CKSUM
Main_server DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
gptid/5fe364f5-fdf0-11e2-b9a1-94de80710fb0 ONLINE 0 0 0
15124687640628945214 UNAVAIL 0 0 0 was /dev/gptid/607cbca5-fdf0-11e2-b9a1-94de80710
fb0

errors: No known data errors

camcontrol devlist:

<ST2000DL003-9VT166 CC3C> at scbus1 target 0 lun 0 (ada1,pass1)
<ST1000DM003-1CH162 CC46> at scbus2 target 0 lun 0 (ada2,pass2)
<ST31000520AS CC32> at scbus3 target 0 lun 0 (ada3,pass3)

gpart show:

Segmentation fault: 11

I would love some suggestions that don't involve wiping the drive and starting over. I tried a scrub, but that did not solve the problem

Thank you for your consideration.
 
Last edited:

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
you have a monster problem here. It appears you've lost ada0, completely, from the pool. My guess is that your system has been hanging by a thread for a long time, and you have not been properly maintaining it. You are now operating on only one drive of the mirror pair. It could blow up at any moment.

Depending on what is wrong, you have somewhere between 1 hour and 1 year to replace the missing drive on the "main-server" pool, or you're going to drop the pool with no chance of recovery of the files.

Also, show me the output of this, IN ITS ENTIRETY:
Code:
smartctl -x -qnoserial /dev/ada1
so I can assess how much of an emergency you have there. You either have a code yellow, or a code crimson pigeonsblood red. In fact, while we're at it, let me see your other drives too:
Code:
smartctl -x -qnoserial /dev/ada2
smartctl -x -qnoserial /dev/ada3



Also, your entire configuration is highly suspect. Your motherboard is an extremely bad choice for FreeNAS, and the 1-drive pools are extremely dubious decisions.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Also, the "segmentation fault" on the "gpart list" is another very bad sign. I suspect system file corruption there. Go to system->update->verify install. You should get no error message, or possibly just one message on "resolv.conf". If you get anything else, tell me what you got.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Well, it's the third gpart segfault I saw on this forum in less than a year so I guess gpart is a bit susceptible... and developed with the feet :D

So, he has done a scrub and no errors, so unless there's some errors since the scrub there's no corruption. I guess the first thing to do is replace this drive and then do some analysis on why it has been dropped from the pool ;)
 

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
Yes, afraid you are dealing with supreme noob. The alert is a flashing a bleeding, Reservoir Dogs, Kill Bill red.

Trying to remember how to copy the full report out of the Shell....arrgggg.

For the "system->update->verify install", I assume this is from the GUI? I don't have an "update" selection under "system".
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Yes, afraid you are dealing with supreme noob. The alert is a flashing a bleeding, Reservoir Dogs, Kill Bill red.

Trying to remember how to copy the full report out of the Shell....arrgggg.

For the "system->update->verify install", I assume this is from the GUI? I don't have an "update" selection under "system".
You should be able to manage some kind of "cut and paste" from the shell. Take the contents and paste it to "pastebin.com", and then share the pastebin link with us.

Also, I can't remember how it was in 9.2.1.5, since we've been on 9.3 for quite some time, sir ;) But the "VERIFY INSTALL" is there, somewhere, in the System tab in the GUI. Maybe it's under "advanced" or something.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Redirect the output of the command to a file on one of your shares. For example: your command > /mnt/your_pool_name/your_shared_dataset_name/output.txt Then you can access it directly from a client.

Ah, you are on 9.2, not 9.3, I don't think the verify install existed in the 9.2 version.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Ah, you are on 9.2, not 9.3, I don't think the verify install existed in the 9.2 version.
Sir I am almost dead-balls positive that "verify install" existed in the 9.2 series. I remember the whole "resolv.conf" thing didn't "break" until near the end of the 9.2.1.8? series or something. Monkey_ is checking on it now (he's downloading the ISO now and will report back).
 

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
OK, I rebooted and it appears that the drives have been renumbered: ada1 is now ada0, ada2 is now ada1, etc.

Here is ada0 (ex ada1):

[root@freenas] ~# smartctl -x -qnoserial /dev/ada0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda Green (AF)
Device Model: ST2000DL003-9VT166
Firmware Version: CC3C
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Oct 26 22:07:37 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Disabled
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
(pass0:ata3:0:0:0): SMART. ACB: b0 d6 e0 4f c2 40 00 00 00 00 01 00
(pass0:ata3:0:0:0): CAM status: Command timeout
Write SCT (Get) XXX Error Recovery Control Command failed: Input/output error
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 623) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 345) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30b7) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 105 099 006 - 9391952
3 Spin_Up_Time PO---- 093 092 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 909
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-- 080 060 030 - 4407283860
9 Power_On_Hours -O--CK 074 074 000 - 23091
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 181
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 098 098 000 - 2
188 Command_Timeout -O--CK 099 098 000 - 8590065673
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 066 044 045 Past 34 (0 34 34 33 0)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 869
193 Load_Cycle_Count -O--CK 100 100 000 - 919
194 Temperature_Celsius -O---K 034 056 000 - 34 (0 14 0 0 0)
195 Hardware_ECC_Recovered -O-RC- 029 004 000 - 9391952
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 280955285690927
241 Total_LBAs_Written ------ 100 253 000 - 2125462006
242 Total_LBAs_Read ------ 100 253 000 - 622532094
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read GP Log Directory failed

SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x09 SL R/W 1 Selective self-test log
0x80-0x9f SL R/W 16 Host vendor specific log
0xa1 SL VS 20 Device vendor specific log
0xa8 SL VS 20 Device vendor specific log
0xa9 SL VS 1 Device vendor specific log
0xc0 SL VS 1 Device vendor specific log
0xe0 SL R/W 1 SCT Command/Status
0xe1 SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log Version: 1
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

(pass0:ata3:0:0:0): SMART. ACB: b0 d6 e0 4f c2 40 00 00 00 00 01 00
(pass0:ata3:0:0:0): CAM status: Command timeout
Write SCT Data Table failed: Input/output error
Read SCT Temperature History failed

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04) not supported

ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read SATA Phy Event Counters failed
 

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
Should I do ada1 & ada2?

Continuing the search for Verify Install. No joy so far.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
No SMART tests have ever been run on this drive, there are failed reads on the SMART info, the motherboard is a mess with accessing these drives.

But, it doesn't look like it's about to explode. I would suggest sir, that at a minimum, you IMMEDIATLEY replace the failed/missing drive according to the instructions in the FreeNAS manual. And, over the longer term, that you replace this woefully insufficient hardware that is an accident waiting to happen with something more suitable for a world-class NAS.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Should I do ada1 & ada2?

Continuing the search for Verify Install. No joy so far.
Yes, do ada1 and ada2.

Also, we can't find "Verify install" in 9.2.1.5, it must have made its appearance later. Sorry.
 

DaveF81

Explorer
Joined
Jan 28, 2014
Messages
56
For S&G I installed 9.2.1.5 in a VM. Ahh, the memories! Confirmed there is no verify install button. I recall it being in the 9.2.1.x series, but I don't think it was there until later.
 

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
[root@freenas] ~# smartctl -x -qnoserial /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST1000DM003-1CH162
Firmware Version: CC46
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Oct 26 22:17:54 2015 EDT

==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, frozen [SEC2]
(pass1:ata4:0:0:0): SMART. ACB: b0 d6 e0 4f c2 40 00 00 00 00 01 00
(pass1:ata4:0:0:0): CAM status: ATA Status Error
(pass1:ata4:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(pass1:ata4:0:0:0): RES: 51 04 00 00 00 00 00 00 00 10 00
Write SCT (Get) XXX Error Recovery Control Command failed: Input/output error
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off supp ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 108) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 113 100 006 - 55133984
3 Spin_Up_Time PO---- 097 097 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 76
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
7 Seek_Error_Rate POSR-- 100 253 030 - 411296
9 Power_On_Hours -O--CK 079 079 000 - 18543
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 76
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 0 0 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 067 047 045 - 33 (Min/Max 31/33)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 43
193 Load_Cycle_Count -O--CK 100 100 000 - 259
194 Temperature_Celsius -O---K 033 053 000 - 33 (0 15 0 0 0)
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 18544h+24m+45.727s
241 Total_LBAs_Written ------ 100 253 000 - 839354236
242 Total_LBAs_Read ------ 100 253 000 - 2023400511
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not i mplemented for legacy controllers
Read GP Log Directory failed

SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x09 SL R/W 1 Selective self-test log
0x80-0x9f SL R/W 16 Host vendor specific log
0xa1 SL VS 20 Device vendor specific log
0xa8 SL VS 129 Device vendor specific log
0xa9 SL VS 1 Device vendor specific log
0xc0 SL VS 1 Device vendor specific log
0xc1 SL VS 10 Device vendor specific log
0xc4 SL VS 5 Device vendor specific log
0xe0 SL R/W 1 SCT Command/Status
0xe1 SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log Version: 1
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04) not supported

ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not i mplemented for legacy controllers
Read SATA Phy Event Counters failed
 

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
[root@freenas] ~# smartctl -x -qnoserial /dev/ada2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda LP
Device Model: ST31000520AS
Firmware Version: CC32
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5900 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Mon Oct 26 22:19:03 2015 EDT

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213915en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Disabled
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unknown

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 623) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 222) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 117 087 006 - 113902
3 Spin_Up_Time PO---- 098 095 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 1
5 Reallocated_Sector_Ct PO--CK 070 070 036 - 1235
7 Seek_Error_Rate POSR-- 075 060 030 - 0
9 Power_On_Hours -O--CK 059 059 000 - 36038
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 1003
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 001 001 000 - 5331
188 Command_Timeout -O--CK 100 099 000 - 47245361176
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 064 047 045 - 36 (Min/Max 34/36)
194 Temperature_Celsius -O---K 036 053 000 - 36 (0 18 0 0 0)
195 Hardware_ECC_Recovered -O-RC- 039 006 000 - 113902
197 Current_Pending_Sector -O--C- 098 096 000 - 90
198 Offline_Uncorrectable ----C- 098 096 000 - 90
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 214666760452725
241 Total_LBAs_Written ------ 100 253 000 - 71144732
242 Total_LBAs_Read ------ 100 253 000 - 2334131254
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read GP Log Directory failed

SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 SL R/O 5 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 SL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 SL R/O 1 NCQ Command Error log
0x11 SL R/O 1 SATA Phy Event Counters
0x21 SL R/O 1 Write stream error log
0x22 SL R/O 1 Read stream error log
0x80-0x9f SL R/W 16 Host vendor specific log
0xa1 SL VS 20 Device vendor specific log
0xa8 SL VS 129 Device vendor specific log
0xa9 SL VS 1 Device vendor specific log
0xbd SL VS 252 Device vendor specific log
0xc0 SL VS 1 Device vendor specific log
0xe0 SL R/W 1 SCT Command/Status
0xe1 SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log Version: 1
ATA Error Count: 4785 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4785 occurred at disk power-on lifetime: 36037 hours (1501 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 10 ff ff ff 4f 00 00:04:42.774 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:39.011 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:35.328 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:31.615 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.939 READ DMA EXT

Error 4784 occurred at disk power-on lifetime: 36037 hours (1501 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 10 ff ff ff 4f 00 00:04:39.011 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:35.328 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:31.615 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.939 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.932 READ DMA EXT

Error 4783 occurred at disk power-on lifetime: 36037 hours (1501 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 10 ff ff ff 4f 00 00:04:35.328 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:31.615 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.939 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.932 READ DMA EXT
c8 00 10 90 02 40 e0 00 00:04:27.932 READ DMA

Error 4782 occurred at disk power-on lifetime: 36037 hours (1501 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 10 ff ff ff 4f 00 00:04:31.615 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.939 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.932 READ DMA EXT
c8 00 10 90 02 40 e0 00 00:04:27.932 READ DMA
c8 00 00 80 01 40 e0 00 00:04:27.931 READ DMA

Error 4781 occurred at disk power-on lifetime: 36037 hours (1501 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 10 ff ff ff 4f 00 00:04:27.939 READ DMA EXT
25 00 10 ff ff ff 4f 00 00:04:27.932 READ DMA EXT
c8 00 10 90 02 40 e0 00 00:04:27.932 READ DMA
c8 00 00 80 01 40 e0 00 00:04:27.931 READ DMA
c8 00 00 80 00 40 e0 00 00:04:27.930 READ DMA

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 36 Celsius
Power Cycle Min/Max Temperature: 33/36 Celsius
Lifetime Min/Max Temperature: 17/53 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 10 minutes
Temperature Logging Interval: 59 minutes
Min/Max recommended Temperature: 14/55 Celsius
Min/Max Temperature Limit: 10/60 Celsius
Temperature History Size (Index): 128 (32)

Index Estimated Time Temperature Celsius
33 2015-10-21 16:38 33 **************
... ..( 6 skipped). .. **************
40 2015-10-21 23:31 33 **************
41 2015-10-22 00:30 32 *************
... ..( 9 skipped). .. *************
51 2015-10-22 10:20 32 *************
52 2015-10-22 11:19 33 **************
53 2015-10-22 12:18 33 **************
54 2015-10-22 13:17 32 *************
... ..( 3 skipped). .. *************
58 2015-10-22 17:13 32 *************
59 2015-10-22 18:12 31 ************
... ..( 2 skipped). .. ************
62 2015-10-22 21:09 31 ************
63 2015-10-22 22:08 32 *************
... ..( 8 skipped). .. *************
72 2015-10-23 06:59 32 *************
73 2015-10-23 07:58 33 **************
74 2015-10-23 08:57 ? -
75 2015-10-23 09:56 33 **************
76 2015-10-23 10:55 ? -
77 2015-10-23 11:54 33 **************
78 2015-10-23 12:53 33 **************
79 2015-10-23 13:52 34 ***************
... ..( 3 skipped). .. ***************
83 2015-10-23 17:48 34 ***************
84 2015-10-23 18:47 33 **************
... ..( 11 skipped). .. **************
96 2015-10-24 06:35 33 **************
97 2015-10-24 07:34 ? -
98 2015-10-24 08:33 32 *************
99 2015-10-24 09:32 ? -
100 2015-10-24 10:31 32 *************
101 2015-10-24 11:30 32 *************
102 2015-10-24 12:29 33 **************
103 2015-10-24 13:28 33 **************
104 2015-10-24 14:27 33 **************
105 2015-10-24 15:26 34 ***************
... ..( 4 skipped). .. ***************
110 2015-10-24 20:21 34 ***************
111 2015-10-24 21:20 36 *****************
112 2015-10-24 22:19 34 ***************
113 2015-10-24 23:18 34 ***************
114 2015-10-25 00:17 33 **************
... ..( 5 skipped). .. **************
120 2015-10-25 06:11 33 **************
121 2015-10-25 07:10 34 ***************
122 2015-10-25 08:09 34 ***************
123 2015-10-25 09:08 33 **************
124 2015-10-25 10:07 33 **************
125 2015-10-25 11:06 34 ***************
126 2015-10-25 12:05 34 ***************
127 2015-10-25 13:04 33 **************
0 2015-10-25 14:03 33 **************
1 2015-10-25 15:02 33 **************
2 2015-10-25 16:01 34 ***************
3 2015-10-25 17:00 35 ****************
... ..( 3 skipped). .. ****************
7 2015-10-25 20:56 35 ****************
8 2015-10-25 21:55 34 ***************
9 2015-10-25 22:54 33 **************
... ..( 2 skipped). .. **************
12 2015-10-26 01:51 33 **************
13 2015-10-26 02:50 32 *************
... ..( 4 skipped). .. *************
18 2015-10-26 07:45 32 *************
19 2015-10-26 08:44 33 **************
... ..( 4 skipped). .. **************
24 2015-10-26 13:39 33 **************
25 2015-10-26 14:38 32 *************
26 2015-10-26 15:37 33 **************
27 2015-10-26 16:36 33 **************
28 2015-10-26 17:35 ? -
29 2015-10-26 18:34 33 **************
30 2015-10-26 19:33 ? -
31 2015-10-26 20:32 33 **************
32 2015-10-26 21:31 33 **************

SCT Error Recovery Control:
Read: Disabled
Write: Disabled

Device Statistics (GP Log 0x04) not supported

ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read SATA Phy Event Counters failed
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Aouch, ada2 is in bad shape, but it has more than 36 khours (4.1 years) so not too bad I guess :)

But yeah, not SMART test, ever, that's a very bad thing. You should setup SMART tests.

And please, for my eyes, the next time use the code tags, thanks.

Edit: and all three drives have seen more than 50 °C at least once in their lifetime, very bad thing too. They should never be allowed to go beyond 40 °C.
 

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
Thanks for the great advice guys. I will get a new drive this weekend and also invest in a new motherboard. Can you make any mobo suggestions? Any chance I can port over my CPU or am I looking at having to upgrade that as well?

I will set up SMART tests. And I will figure out how to use the "code tags" next time.

Again, many thanks!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
To use code tags, just type [ code ] (without the spaces) before the SMART (or whatever) output, and [ /code ] after it. That will preserve formatting, which is pretty important for SMART output, and even more important for zpool status and similar output. The result will look like this:
Code:
[root@freenas2] ~# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Fri Oct 23 03:45:45 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    freenas-boot                                    ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       gptid/1b6fb23e-bec6-11e4-8407-0cc47a01304d  ONLINE       0     0     0
       gptid/1b7f00c5-bec6-11e4-8407-0cc47a01304d  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 30h6m with 0 errors on Mon Oct 19 06:06:50 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            ONLINE       0     0     0
     raidz2-0                                      ONLINE       0     0     0
       gptid/9a85d15f-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9afa89ae-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9b6cc00b-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9c501d57-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9cc41939-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/9d39e31d-8d5c-11e4-8732-0cc47a01304d  ONLINE       0     0     0
     raidz2-1                                      ONLINE       0     0     0
       gptid/f5b737a6-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f6284bf9-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f68f4fa9-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f722e509-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f7d115c2-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0
       gptid/f84821c1-8e41-11e4-8732-0cc47a01304d  ONLINE       0     0     0

errors: No known data errors
[root@freenas2] ~# 


Note that the "0"s in the status are in neat colums, and that the individual devices in my RAIDZ2 vdevs are indented under the raidz2-0 and raidz2-1 headings.

For the motherboard, the SuperMicro X9SCL/SCM are a very good choice, and should work with your existing CPU. It will also support ECC RAM, which I'd recommend you upgrade to, though that can wait a bit if needed.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sir I am almost dead-balls positive that "verify install" existed in the 9.2 series. I remember the whole "resolv.conf" thing didn't "break" until near the end of the 9.2.1.8? series or something. Monkey_ is checking on it now (he's downloading the ISO now and will report back).

9.2 didn't exist in "verify install". That feature was added with 9.3. ;)
 

lyle

Contributor
Joined
Jul 30, 2013
Messages
123
Thanks, guys. Replaced my drive and all is good!

I've ordered the SuperMicro X9SCL, should have it next week. Not cheap. What can I expect to see improve with this upgrade? Speed is not too important to me, but data integrity is.
 
Status
Not open for further replies.
Top