Opinion - new hard drive with SMART errors

tomf84 · Apr 15, 2013

Hi all

Today I received a new Seagate Barracuda 2TB drive - model ST2000DM001-1CH164 - to replace a bad drive in my zpool. (Here's my other thread.)

Thought I'd zero the drive to start with, firstly to force a write across the whole drive, secondly to sanity check the drive health. I wasn't expecting to find anything untoward, but these SMART values don't look right to me for a brand new drive.

Just after start of dd if=/dev/zero of=/dev/sda bs=1M

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   006    Pre-fail  Always       -       33408
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       67
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   074   045    Old_age   Always       -       25 (Min/Max 22/25)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   025   040   000    Old_age   Always       -       25 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       4
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       252745940467712
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       135168
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       11443

After completion

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   006    Pre-fail  Always       -       194400
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       32207
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       5
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       7
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       4295032833
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   065   045    Old_age   Always       -       34 (Min/Max 31/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       7
194 Temperature_Celsius     0x0022   034   040   000    Old_age   Always       -       34 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       4
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       23871428231173
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       4031831224
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       57737

There are no SMART errors logged, just ascending numbers (although I am running a long test overnight).

RMA the drive, or normal and fine to keep and put in service?

HolyK · Apr 15, 2013

RMA from my point of view

Code:

188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       4295032833

Or maybe it could be just a bad SATA cable ...

titan_rw · Apr 15, 2013

Nothing looks wrong to me.

Ideally, the UDMA_CRC_Error_Count would be zero, but since it was non zero before the drive wipe, I wouldn't worry about it.

Unless you know how the smart attributes are reported, it's kind of hard to tell what they should be. For example, this is some information on seek error rate:

http://forums.seagate.com/t5/Barracuda-XT-Barracuda-Barracuda/Seagate-s-Seek-Error-Rate-Raw-Read-Error-Rate-and-Hardware-ECC/td-p/122382

This number will ALWAYS go up as it includes in the raw value the number of total seeks.

Taking the examples above, the math should work out as follows:

Before zero pass:

Dec raw value: 33408.
Hex raw value: 0x000000008280

The 8 least significant digits are the number of seeks.
The 4 most significant digits are the number of seek errors.

Therefore, this drive has had 0 seek errors out of 33408 seeks.

After zero pass:

Dec raw value: 194400.
Hex raw value: 0x00000002F760

Therefore, this drive has had 0 seek errors out of 194400 seeks.

Sounds good to me!

Also note that this number (actual number of seek errors) is not considered statistically relevant until the drive has at least 1 million seeks. IE, 2 seek errors out of 10,000 does not necessarily mean the seek error rate is 'high'.

Here's one of my older seagate drives:

Power on hours: 27284
Seek error rate: 109044803

Hex: 67FE443

This value is 8 digits or less, so there has been 0 seek errors out of 109 million seeks.

The raw read error rate is also done similarly.

These are those kind of attributes you can't really look at the raw value. Check the normalized value instead. If it starts going down, then investigate.

In any case, I'd say your drive is good to go. For a sanity check, if it were me, I'd probably run a drive wipe using random data instead of zero's. It takes longer, as /dev/urandom is cpu limited. But you're not just storing zero's either. After the random write, do a dd read pass, then possibly another zero write pass. Then put it into production.

HolyK · Apr 16, 2013

Removed confusing text from my previous post. I overlooked that you have Seagate. Read and seek error rates are vendor specific raw values. Check titan_rw's post or Seagate for true meaning.

Anyway, "Command Timeout" should be "zero" ...

The count of aborted operations due to HDD timeout. Normally this attribute value should be equal to zero and if the value is far above zero, then most likely there will be some serious problems with power supply or an oxidized data cable.

EDIT: Is this some kind of hybrid device with small SSD inside? You have nonzero "Runtime Bad Block". For SSD its OK on small values, but for HDD ... don't know, but keep an eye on it...

Runtime bad blocks are the result of an unexpected voltage shift of a cell during a read, write or erase operation.

Important Announcement for the TrueNAS Community.

Opinion - new hard drive with SMART errors

tomf84

Dabbler

HolyK

Ninja Turtle

titan_rw

Guru

HolyK

Ninja Turtle

Similar threads

Important Announcement for the TrueNAS Community.

Opinion - new hard drive with SMART errors

tomf84

Dabbler

HolyK

Ninja Turtle

titan_rw

Guru

HolyK

Ninja Turtle

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Opinion - new hard drive with SMART errors"

Similar threads