I am running FreeNAS 9.1.0, I have 6 * 3tb hard drives, using RAIDZ2. The system was built a little over a month ago, all new components and hard drives.
When I log into the GUI front end in my web browser the main log (a preview is shown at the bottom of the screen) is reporting the following:
Very clear, but upsetting, since that's 4 out of 6 hard drives. I have spent today doing extra backups of my NAS, for obvious reasons. Wanting some more details, I opened the console and ran the command:
smartctrl -h /dev/ada0 | more
the interesting bits were:
Unless I am reading this totally wrong, this is saying there are zero errors, and everything is looking good. I am getting the same results for the other 5 drives in the system. I can see no useful difference between the "failing" drives and the healthy drives.
I then ran
zpool status Main-Storage
which returned:
Can anyone offer any thoughts or suggestions? Am I just misunderstanding this? The log messages are very clear, but seem to contradict the SMART test results.
In theory a scrub will be run every Saturday, as recommended here: http://doc.freenas.org/index.php/ZFS_Scrubs
but my initial settings were not quite right, so only one scrub has been run so far it seems.
If I have not provided enough details, which is likely, what information will help?
When I log into the GUI front end in my web browser the main log (a preview is shown at the bottom of the screen) is reporting the following:
Code:
Oct 6 06:32:01 freenas smartd[2337]: Device: /dev/ada1, FAILED SMART self-check. BACK UP DATA NOW! Oct 12 10:32:01 freenas smartd[2337]: Device: /dev/ada0, FAILED SMART self-check. BACK UP DATA NOW! Oct 13 08:02:02 freenas smartd[2337]: Device: /dev/ada2, FAILED SMART self-check. BACK UP DATA NOW! Oct 13 08:02:02 freenas smartd[2337]: Device: /dev/ada3, FAILED SMART self-check. BACK UP DATA NOW!
Very clear, but upsetting, since that's 4 out of 6 hard drives. I have spent today doing extra backups of my NAS, for obvious reasons. Wanting some more details, I opened the console and ran the command:
smartctrl -h /dev/ada0 | more
the interesting bits were:
Code:
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD30EFRX-68AX9N0
Serial Number: WD-WCC1T1490741
LU WWN Device Id: 5 0014ee 2b38719e7
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Oct 13 18:53:39 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 179 175 021 Pre-fail Always - 6008
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 18
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1067
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 18
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 8
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 9
194 Temperature_Celsius 0x0022 122 115 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 17
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1038 -
# 2 Extended offline Completed without error 00% 870 - Unless I am reading this totally wrong, this is saying there are zero errors, and everything is looking good. I am getting the same results for the other 5 drives in the system. I can see no useful difference between the "failing" drives and the healthy drives.
I then ran
zpool status Main-Storage
which returned:
Code:
pool: Main-Storage
state: ONLINE
scan: scrub repaired 0 in 4h50m with 0 errors on Sat Oct 5 04:50:21 2013
config:
NAME STATE READ WRITE CKSUM
Main-Storage ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/ef2bd5c3-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0
gptid/efa5f830-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0
gptid/f0222234-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0
gptid/f09a8dc3-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0
gptid/f1108fc6-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0
gptid/f186fb53-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0
errors: No known data errorsCan anyone offer any thoughts or suggestions? Am I just misunderstanding this? The log messages are very clear, but seem to contradict the SMART test results.
In theory a scrub will be run every Saturday, as recommended here: http://doc.freenas.org/index.php/ZFS_Scrubs
but my initial settings were not quite right, so only one scrub has been run so far it seems.
If I have not provided enough details, which is likely, what information will help?