HDD error?

Status
Not open for further replies.

Lacika1981

Dabbler
Joined
Mar 18, 2017
Messages
31
Hi,

My question is not exactly related to FreeNAS.
I have a server with two HDDs. Both 1TB drive. Sometimes (few times a day) the HDD read/write LED goes solid and I can not reach the server. It lasts for 30-60 secs. After this I can hear the disk head moving and I can use the server normal again. Only one drive is affected. I have not had problem with the other drive.

How I could check the disks?

Thank you
 

Lacika1981

Dabbler
Joined
Mar 18, 2017
Messages
31
Here it is the SMART data

HDD1

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 253 167 021 Pre-fail Always - 1025
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 207
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 16913
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 156
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 108
193 Load_Cycle_Count 0x0032 155 155 000 Old_age Always - 137973
194 Temperature_Celsius 0x0022 117 099 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0



HDD2

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 59466885
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 27
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 067 060 030 Pre-fail Always - 5424477
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 15768
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 27
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 061 045 Old_age Always - 36 (Min/Max 28/39)
194 Temperature_Celsius 0x0022 036 040 000 Old_age Always - 36 (0 21 0 0 0)
195 Hardware_ECC_Recovered 0x001a 046 044 000 Old_age Always - 59466885
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,477
Please follow forum rules and post your full hardware specs. Are you using FreeNAS? If so, then please post your full version with build number.

Also posting your output of your smart tests in tags makes it much easier to read.

Here is a great hard drive troubleshooting guide that was written that should walk you through to determine what is going on:
https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/
 

Lacika1981

Dabbler
Joined
Mar 18, 2017
Messages
31
Hi,

Thank you for the link.
I apologise. for the messy post.
I am using FreeNAS Corral 10.0.2.
My server is a DELL CS24-NV7, Dual Opteron 2373EE and 16GB RAM (4x4GB).
Two disks. Seagate and WD.

I have read through the document and I think one of my drive has high seek error rate. Or am I wrong?

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
7 Seek_Error_Rate 0x000f 067 060 030 Pre-fail Always 5424477
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
If that is the Seagate, it is probably normal. As it is a 'rate', it actually codes two values which form a ratio.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
How I could check the disks?
Are you running regular SMART tests on them? If not, set up a schedule now and see if it reveals anything. There's nothing obviously wrong based on the limited information you posted so far.
 

Lacika1981

Dabbler
Joined
Mar 18, 2017
Messages
31
I have done short and long tests on both drives.

On the Seagate I can see the Raw_Read_Error_Rate and Hardware_ECC_Recovered increasing (both have same value).
It was 6 hours ago 59466885 now it is 60118406.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I have no idea if that's significant or not. Modern hard drives no longer store data in discrete bits, they have to reconstruct it from waveforms based on probabilistic calculations*, so error correction at the lowest level is normal.

[*] this is what I've gathered from what I've read, so someone please correct me if you truly understand this stuff ;)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I have no idea if that's significant or not. Modern hard drives no longer store data in discrete bits, they have to reconstruct it from waveforms based on probabilistic calculations*, so error correction at the lowest level is normal.

[*] this is what I've gathered from what I've read, so someone please correct me if you truly understand this stuff ;)
Yeah, The number SMART reports is also mostly meaningless without further interpretation.
 

Lacika1981

Dabbler
Joined
Mar 18, 2017
Messages
31
I have no idea if that's significant or not. Modern hard drives no longer store data in discrete bits, they have to reconstruct it from waveforms based on probabilistic calculations*, so error correction at the lowest level is normal.

[*] this is what I've gathered from what I've read, so someone please correct me if you truly understand this stuff ;)

OK. Now I do not care with the drive anymore :D

The drives are mirrored and I have backup on an USB disk drive. I just wanted to know if it is dying or having a different problem.

Thank you for your help
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
While I'm not thrilled with your HDD 2 ID195 high error count, and since you were not very liberal with your hard drive IDs and model numbers, I'm going to guess that HDD2 is the Seagate drive. If it is then I wouldn't worry about it.

Unfortunately none of this addresses your original problem.
 
Joined
Dec 2, 2015
Messages
730
I have done short and long tests on both drives.

On the Seagate I can see the Raw_Read_Error_Rate and Hardware_ECC_Recovered increasing (both have same value).
It was 6 hours ago 59466885 now it is 60118406.
On Seagate drives, most of the SMART raw values are coded in a proprietary format, and the displayed numeric values are meaningless. More info from Seagate.
 
Last edited:

Lacika1981

Dabbler
Joined
Mar 18, 2017
Messages
31
Hi

Thanks everyone to help me.

I updated to 10.0.4 and it seems the problem gone.
I am not going to use Corral long just until the 9.10.3 will be released.
 
Status
Not open for further replies.
Top