Does this mean I need a New Hard Drive?

Grantp · Oct 26, 2014

Hi, I recently noticed this error when I plugged a monitor into my FreeNAS box.

Does this mean I need to replace the drive?

Also would this error cause my FreeNAS to grind to an almost complete stop while writing to the drives. It is taking 30+ mins to copy a 400mb file across to my FreeNAS from my Windows 7 PC (CIF Share) and while the copy it taking place my whole PC hangs.

I ought to add that my FreeNAS has been running perfect for many months, and after a re-boot of FreeNAS + Windows 7 PC first 20-30 GB copies as normal then grinds to a halt.

My system specs are below.

Many thanks Grant

Mlovelace · Oct 26, 2014

Yep those are smart errors, time to replace the drive. Do you have email notifications setup? FreeNAS would have notified you of the bad drive if you did.

Grantp · Oct 26, 2014

Mlovelace said:
Yep those are smart errors, time to replace the drive. Do you have email notifications setup? FreeNAS would have notified you of the bad drive if you did.

Thanks for reply - Yes I do have email notifications but I'm not and didn't get any mail.

As an aside I went for WD RED drives as I thought they would be good this is the 4th one to go KAPUT inside 12 months.

Would that cause my system to grind to a halt? It is virtually unusable at the moment!!

SweetAndLow · Oct 26, 2014

You should post your smart info for that drive. If you have had several drives fail I suspect something else is wrong with your system.

gpsguy · Oct 26, 2014

Rather than starting new threads, please post the results of: smartctl -a /dev/da0 using code tags.

Doing so, will preserve the formatting, making it easier for us to read.

Grantp · Oct 26, 2014

Here's the output from smartctl command you suggested. I realised it's one of my old Seagate drives that is erroring not my WD Reds. Sorry for double post thought it might have been 2 differant problems.

smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST3000DM001-1CH166
Serial Number: W1F1RHR2
LU WWN Device Id: 5 000c50 05cfb8607
Firmware Version: CC24
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Oct 26 22:07:55 2014 GMT

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 584) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 335) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 175873752
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 157
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 079 060 030 Pre-fail Always - 8757862299
9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14526
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 159
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 098 098 000 Old_age Always - 2
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1 1 1
189 High_Fly_Writes 0x003a 081 081 000 Old_age Always - 19
190 Airflow_Temperature_Cel 0x0022 058 047 045 Old_age Always - 42 (Min/Max 36/42)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 156
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 8302
194 Temperature_Celsius 0x0022 042 053 000 Old_age Always - 42 (0 15 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 14142h+45m+30.971s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 44586145952
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 62871802821

SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 14017 hours (584 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 ff ff ff 4f 00 38d+20:51:39.632 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 38d+20:51:39.631 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 38d+20:51:39.630 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 38d+20:51:39.629 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 38d+20:51:39.629 READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 13202 hours (550 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 28 ff ff ff 4f 00 4d+21:57:46.741 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 4d+21:57:46.740 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 4d+21:57:46.716 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 4d+21:57:46.700 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 4d+21:57:46.700 READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 4315 -
# 2 Short offline Completed without error 00% 4291 -
# 3 Short offline Completed without error 00% 4267 -
# 4 Short offline Completed without error 00% 4244 -
# 5 Short offline Completed without error 00% 4220 -
# 6 Extended offline Completed without error 00% 4202 -
# 7 Short offline Completed without error 00% 4195 -
# 8 Short offline Completed without error 00% 4172 -
# 9 Short offline Completed without error 00% 4172 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Grantp · Oct 26, 2014

Sorry for being so stupid, but reading the documentation on how to 'Replace a failed Drive' it says Storage → Volumes → View Volumes → VolumeStatus find the failed drive then click 'Offline'. I have replaced disks before but this time I can't find the 'Offline' button when i click on da0.

Please ignore above comment I said I was being stupid I've found the 'Offine' & 'Replace' buttons.

gpsguy · Oct 26, 2014

Here's a snippet of the results from your SMART query.

Your hard disk has been powered on for 14526 hours. Lines 197 and 198 should be zero. This is what you are seeing the errors on the screen print you provided.

While you may have email notifications setup, you aren't running SMART tests on a regular basis. The last short test you ran was 10,211 hours ago. Had you been running the short and long (extended) tests, you probably would have been notified of the issue.

Read the section of the manual that describes how to configure S.M.A.R.T. Tests. Here's an example of how cyberjock has configured the tests to run on his server - https://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/

To run the short test on demand: smartctl -t short /dev/adaX Substitute your device for adaX. A short test should only take a couple of minutes. To run a long test substitute that word in the command. It will take several hours to run. You should run these tests on your other drives, to get an idea of their health. How did you know which drives to replace in the past?

Lastly, when posting information like this in the future, please use the code tags. And the beginning of the code, insert (code), at the end use (/code). Replace the parentheses in my example with square brackets "[ ]".

Code:

9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14526

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 4315 -
...
# 6 Extended offline Completed without error 00% 4202 -

Grantp · Oct 27, 2014

Thanks for your help gpsguy, I've replaced faulty disk and system is running again

Grantp · Oct 30, 2014

Grantp said:
Thanks for your help gpsguy, I've replaced faulty disk and system is running again

Looks like I spoke to soon. Today I started to copy my data back onto my FreeNAS box. To see if everything was working OK I started to copy over 656 GB from my Windows 7 PC. All looks good at first, the copy window tells me it's copying at 117.0 MB/second and wil take 1 hour. Within 2/3 minutes after copying 7 or 8 GB the speed starts to slowly drop after 5/6 minutes speed is down to 17 MB/second and now telling me 16 hours to complete the copy. As of right now speed is still dropping by 0.1MB/second about every 30 seconds. Last night after leaving it running for hours I came back to find copy speed down to 17Kb/second. I am not sure where to start looking as have replaced my faulty WD 3TB Red and I'm still getting the original problem.

SweetAndLow · Oct 30, 2014

Did you ever configure smart test to run automatically? And did you get your emails working? You should start there to see if freenas will just tell you what is wrong without having to do any work.

Section 4.5 and 8.10 for S.M.A.R.T. and Section 4.6.3 for email
http://web.freenas.org/images/resources/freenas9.2.1/freenas9.2.1_guide.pdf

Grantp · Oct 30, 2014

SweetAndLow thanks for reply. My bad I hadn't done that yet. I wanted to get all my data back on line, I had copied everything off my FreeNAS box and rebuilt my volumes maybe I didn't need to but decided a fresh start would be good. I also and a couple more brand new WD 3TB Red drives hanging around so I added those. I will get that done now thanks for the section references.
Just as an aside to what I wrote in previous post all the data copied off FreeNAS at expected speeds, it's only when copying to the box from Windows 7 or while copying/moving files to that already exist on the FreeNAS box.

Grantp · Oct 30, 2014

Upon a reboot it mentions l2arc, I've never noticed this before does it always boot up this this. I don't have a l2arc setup.

It's most likely normal but just thought I'd ask.

cyberjock · Oct 30, 2014

That's just tunables and is inconsequential to the problem you are having.

You might want to check your new disk and make sure it doesn't have problems. ;)

Grantp · Oct 30, 2014

cyberjock I have run smartctl -a /dev/da? for each disk I can't see anything that says I have errors

cyberjock · Oct 30, 2014

Grantp said:
cyberjock I have run smartctl -a /dev/da? for each disk I can't see anything that says I have errors

That and run short and long tests. Ideally you should have run badblocks on it before putting it in the server and using it, but it's a little late now. ;)

Important Announcement for the TrueNAS Community.

Does this mean I need a New Hard Drive?

Grantp

Contributor

Mlovelace

Guru

Grantp

Contributor

SweetAndLow

Sweet'NASty

gpsguy

Active Member

Grantp

Contributor

Grantp

Contributor

gpsguy

Active Member

Grantp

Contributor

Grantp

Contributor

SweetAndLow

Sweet'NASty

Grantp

Contributor

Grantp

Contributor

cyberjock

Inactive Account

Grantp

Contributor

cyberjock

Inactive Account

Similar threads

Important Announcement for the TrueNAS Community.

Does this mean I need a New Hard Drive?

Contributor

Guru

Contributor

Sweet'NASty

Active Member

Contributor

Contributor

Active Member

Contributor

Contributor

Sweet'NASty

Contributor

Contributor

Inactive Account

Contributor

Inactive Account

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Does this mean I need a New Hard Drive?"

Similar threads