A little help please,
I just got this error message and am not sure what the reports are telling me. Here's the errors that I was emailed;
So I SSH into my NAS and do a zpool status;
Everything is looking good so far, but I still need to check the drive in question, so I run smartctl -q noserial -a /dev/da3;
So, should I be shutting down the server, find the drive in question and change it out?
I just got this error message and am not sure what the reports are telling me. Here's the errors that I was emailed;
Code:
SMART error (CurrentPendingSector) detected on host The following warning/error was logged by the smartd daemon: Device: /dev/da3 [SAT], 8 Currently unreadable (pending) sectors
Code:
SMART error (OfflineUncorrectableSector) detected on host The following warning/error was logged by the smartd daemon: Device: /dev/da3 [SAT], 8 Offline uncorrectable sectors
So I SSH into my NAS and do a zpool status;
Code:
[root@trinity] ~# zpool status pool: TRINITY_RAID-01 state: ONLINE scan: scrub repaired 0 in 9h12m with 0 errors on Thu May 1 11:12:10 2014 config: NAME STATE READ WRITE CKSUM TRINITY_RAID-01 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/934f3544-e66c-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/93cb8c02-e66c-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/944e85b1-e66c-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/94c96165-e66c-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/95474b4f-e66c-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/95c35cfe-e66c-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/9644ecaf-e66c-11e2-aea7-002590ab7843 ONLINE 0 0 0 spares gptid/96e8da5f-e66c-11e2-aea7-002590ab7843 AVAIL errors: No known data errors pool: TRINITY_RAID-02 state: ONLINE scan: scrub repaired 0 in 48h29m with 0 errors on Sun May 4 02:29:30 2014 config: NAME STATE READ WRITE CKSUM TRINITY_RAID-02 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/78a36f95-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/791e1c0e-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/79a1317d-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/7a25ab00-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/d2369baa-fbca-11e2-9a7e-002590ab7843 ONLINE 0 0 0 gptid/7b33f2c1-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/7c377b71-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 spares gptid/7cdcdaeb-e66d-11e2-aea7-002590ab7843 AVAIL errors: No known data errors pool: TRINITY_RAID-03 state: ONLINE scan: scrub repaired 0 in 2h40m with 0 errors on Thu Apr 3 04:40:03 2014 config: NAME STATE READ WRITE CKSUM TRINITY_RAID-03 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/fa342357-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/fab1cd5a-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/fb33787b-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/fbbd4176-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/fc427d2b-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/fcc87b9c-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 gptid/fd4eb108-e66d-11e2-aea7-002590ab7843 ONLINE 0 0 0 spares gptid/fdf85527-e66d-11e2-aea7-002590ab7843 AVAIL errors: No known data errors
Everything is looking good so far, but I still need to check the drive in question, so I run smartctl -q noserial -a /dev/da3;
Code:
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda (SATA 3Gb/s, 4K Sectors) Device Model: ST3000DM001-1CH166 Firmware Version: CC24 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu May 8 22:32:02 2014 MDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 584) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 330) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 222417224 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 86 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 074 060 030 Pre-fail Always - 27743685 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6633 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 86 183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 068 061 045 Old_age Always - 32 (Min/Max 24/33) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 85 193 Load_Cycle_Count 0x0032 057 057 000 Old_age Always - 87402 194 Temperature_Celsius 0x0022 032 040 000 Old_age Always - 32 (0 23 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 131477538868441 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 42983460975 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 152411166233 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 6618 - # 2 Short offline Completed without error 00% 6515 - # 3 Short offline Completed without error 00% 6404 - # 4 Short offline Completed without error 00% 6286 - # 5 Extended offline Completed without error 00% 6268 - # 6 Short offline Completed without error 00% 6167 - # 7 Short offline Completed without error 00% 6071 - # 8 Extended offline Completed without error 00% 6053 - # 9 Short offline Completed without error 00% 5951 - #10 Short offline Completed without error 00% 5783 - #11 Short offline Completed without error 00% 5668 - #12 Extended offline Completed without error 00% 5650 - #13 Short offline Completed without error 00% 5595 - #14 Short offline Completed without error 00% 5475 - #15 Short offline Completed without error 00% 5381 - #16 Extended offline Interrupted (host reset) 00% 5358 - #17 Short offline Completed without error 00% 5261 - #18 Short offline Completed without error 00% 5165 - #19 Short offline Completed without error 00% 5045 - #20 Extended offline Completed without error 00% 5027 - #21 Short offline Completed without error 00% 4821 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
So, should I be shutting down the server, find the drive in question and change it out?