Does this mean I need a New Hard Drive?

Status
Not open for further replies.

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Hi, I recently noticed this error when I plugged a monitor into my FreeNAS box.
ipmi.jpg
Does this mean I need to replace the drive?

Also would this error cause my FreeNAS to grind to an almost complete stop while writing to the drives. It is taking 30+ mins to copy a 400mb file across to my FreeNAS from my Windows 7 PC (CIF Share) and while the copy it taking place my whole PC hangs.

I ought to add that my FreeNAS has been running perfect for many months, and after a re-boot of FreeNAS + Windows 7 PC first 20-30 GB copies as normal then grinds to a halt.

My system specs are below.

Many thanks Grant
 
Last edited:

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Yep those are smart errors, time to replace the drive. Do you have email notifications setup? FreeNAS would have notified you of the bad drive if you did.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Yep those are smart errors, time to replace the drive. Do you have email notifications setup? FreeNAS would have notified you of the bad drive if you did.

Thanks for reply - Yes I do have email notifications but I'm not and didn't get any mail.

As an aside I went for WD RED drives as I thought they would be good this is the 4th one to go KAPUT inside 12 months.

Would that cause my system to grind to a halt? It is virtually unusable at the moment!!
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You should post your smart info for that drive. If you have had several drives fail I suspect something else is wrong with your system.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Rather than starting new threads, please post the results of: smartctl -a /dev/da0 using code tags.

Doing so, will preserve the formatting, making it easier for us to read.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Here's the output from smartctl command you suggested. I realised it's one of my old Seagate drives that is erroring not my WD Reds. Sorry for double post thought it might have been 2 differant problems.

smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST3000DM001-1CH166
Serial Number: W1F1RHR2
LU WWN Device Id: 5 000c50 05cfb8607
Firmware Version: CC24
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Oct 26 22:07:55 2014 GMT

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 584) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 335) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 175873752
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 157
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 8757862299
9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14526
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 159
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 098 098 000 Old_age Always - 2
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1 1 1
189 High_Fly_Writes 0x003a 081 081 000 Old_age Always - 19
190 Airflow_Temperature_Cel 0x0022 058 047 045 Old_age Always - 42 (Min/Max 36/42)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 156
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 8302
194 Temperature_Celsius 0x0022 042 053 000 Old_age Always - 42 (0 15 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 14142h+45m+30.971s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 44586145952
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 62871802821

SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 14017 hours (584 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 ff ff ff 4f 00 38d+20:51:39.632 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 38d+20:51:39.631 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 38d+20:51:39.630 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 38d+20:51:39.629 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 38d+20:51:39.629 READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 13202 hours (550 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 28 ff ff ff 4f 00 4d+21:57:46.741 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 4d+21:57:46.740 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 4d+21:57:46.716 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 4d+21:57:46.700 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 4d+21:57:46.700 READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 4315 -
# 2 Short offline Completed without error 00% 4291 -
# 3 Short offline Completed without error 00% 4267 -
# 4 Short offline Completed without error 00% 4244 -
# 5 Short offline Completed without error 00% 4220 -
# 6 Extended offline Completed without error 00% 4202 -
# 7 Short offline Completed without error 00% 4195 -
# 8 Short offline Completed without error 00% 4172 -
# 9 Short offline Completed without error 00% 4172 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Sorry for being so stupid, but reading the documentation on how to 'Replace a failed Drive' it says Storage → Volumes → View Volumes → VolumeStatus find the failed drive then click 'Offline'. I have replaced disks before but this time I can't find the 'Offline' button when i click on da0.

Please ignore above comment I said I was being stupid I've found the 'Offine' & 'Replace' buttons.
 
Last edited:

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Here's a snippet of the results from your SMART query.

Your hard disk has been powered on for 14526 hours. Lines 197 and 198 should be zero. This is what you are seeing the errors on the screen print you provided.

While you may have email notifications setup, you aren't running SMART tests on a regular basis. The last short test you ran was 10,211 hours ago. Had you been running the short and long (extended) tests, you probably would have been notified of the issue.

Read the section of the manual that describes how to configure S.M.A.R.T. Tests. Here's an example of how cyberjock has configured the tests to run on his server - https://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/

To run the short test on demand: smartctl -t short /dev/adaX Substitute your device for adaX. A short test should only take a couple of minutes. To run a long test substitute that word in the command. It will take several hours to run. You should run these tests on your other drives, to get an idea of their health. How did you know which drives to replace in the past?

Lastly, when posting information like this in the future, please use the code tags. And the beginning of the code, insert (code), at the end use (/code). Replace the parentheses in my example with square brackets "[ ]".

Code:
9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14526

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 4315 -
...
# 6 Extended offline Completed without error 00% 4202 -
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Thanks for your help gpsguy, I've replaced faulty disk and system is running again
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Thanks for your help gpsguy, I've replaced faulty disk and system is running again

Looks like I spoke to soon. Today I started to copy my data back onto my FreeNAS box. To see if everything was working OK I started to copy over 656 GB from my Windows 7 PC. All looks good at first, the copy window tells me it's copying at 117.0 MB/second and wil take 1 hour. Within 2/3 minutes after copying 7 or 8 GB the speed starts to slowly drop after 5/6 minutes speed is down to 17 MB/second and now telling me 16 hours to complete the copy. As of right now speed is still dropping by 0.1MB/second about every 30 seconds. Last night after leaving it running for hours I came back to find copy speed down to 17Kb/second. I am not sure where to start looking as have replaced my faulty WD 3TB Red and I'm still getting the original problem.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
SweetAndLow thanks for reply. My bad I hadn't done that yet. I wanted to get all my data back on line, I had copied everything off my FreeNAS box and rebuilt my volumes maybe I didn't need to but decided a fresh start would be good. I also and a couple more brand new WD 3TB Red drives hanging around so I added those. I will get that done now thanks for the section references.
Just as an aside to what I wrote in previous post all the data copied off FreeNAS at expected speeds, it's only when copying to the box from Windows 7 or while copying/moving files to that already exist on the FreeNAS box.
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
Upon a reboot it mentions l2arc, I've never noticed this before does it always boot up this this. I don't have a l2arc setup.
l2arc.png
It's most likely normal but just thought I'd ask.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
That's just tunables and is inconsequential to the problem you are having.

You might want to check your new disk and make sure it doesn't have problems. ;)
 

Grantp

Contributor
Joined
Feb 26, 2013
Messages
111
cyberjock I have run smartctl -a /dev/da? for each disk I can't see anything that says I have errors
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
cyberjock I have run smartctl -a /dev/da? for each disk I can't see anything that says I have errors

That and run short and long tests. Ideally you should have run badblocks on it before putting it in the server and using it, but it's a little late now. ;)
 
Status
Not open for further replies.
Top