Faulty hardrive?

sYndax · Feb 2, 2015

Hey,

This morning the freenas server emailed me the following message:

> (da1:mps0:0:1:0): READ(10). CDB: 28 00 73 20 f4 70 00 01 00 00 length 131072 SMID 99 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 b2 eb 07 78 00 00 08 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0xb2eb0778
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 b2 ed b2 60 00 00 08 00 length 4096 SMID 360 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 b2 ed b1 c0 00 00 08 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0xb2edb1c0
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 b2 ee 28 80 00 00 08 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0xb2ee2880
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 b2 ed 05 08 00 00 08 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0xb2ed0508
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 88 90 45 d8 00 01 00 00 length 131072 SMID 560 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 88 90 44 d8 00 01 00 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0x88904558
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 88 f0 72 a8 00 01 00 00 length 131072 SMID 407 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 88 f0 71 a8 00 01 00 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0x88f07270
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 8b 83 89 e8 00 00 28 00 length 20480 SMID 232 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 8b 7a 2e 30 00 00 08 00 length 4096 SMID 934 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 8b 83 88 e8 00 01 00 00 length 131072 SMID 919 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 8b 76 00 18 00 01 00 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0x8b7600f8
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 8b 77 ab 90 00 01 00 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0x8b77abf0
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 8d d4 e5 b8 00 01 00 00 length 131072 SMID 266 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 8d d4 e4 b8 00 01 00 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0x8dd4e588
> (da1:mps0:0:1:0): Error 5, Unretryable error
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 91 78 3d a0 00 01 00 00 length 131072 SMID 901 terminated ioc 804b scsi 0 state 0 xfer 0
> (da1:mps0:0:1:0): READ(10). CDB: 28 00 91 78 3c a0 00 01 00 00
> (da1:mps0:0:1:0): CAM status: SCSI Status Error
> (da1:mps0:0:1:0): SCSI status: Check Condition
> (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> (da1:mps0:0:1:0): Info: 0x91783cd8
> (da1:mps0:0:1:0): Error 5, Unretryable error

I don't quite understand the data above... is /DA1 (3TB hdd) faulty?

dlavigne · Feb 2, 2015

Also post the output of zpool status within code tags.

sYndax · Feb 2, 2015

dlavigne said:
Also post the output of zpool status within code tags.

Code:

[root@Aquanox ~]# zpool status Aquanox
pool: Aquanox
state: ONLINE
scan: scrub repaired 784K in 5h53m with 0 errors on Sun Feb 1 05:53:48 2015
config:

NAME STATE READ WRITE CKSUM
Aquanox ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/9123a697-884a-11e4-a61f-002590474abd ONLINE 0 0 0
gptid/9222a2b9-884a-11e4-a61f-002590474abd ONLINE 0 0 0
gptid/93395d80-884a-11e4-a61f-002590474abd ONLINE 0 0 0
gptid/944a9053-884a-11e4-a61f-002590474abd ONLINE 0 0 0
gptid/94d79366-884a-11e4-a61f-002590474abd ONLINE 0 0 0
gptid/953bfa5c-884a-11e4-a61f-002590474abd ONLINE 0 0 0

errors: No known data errors
[root@Aquanox ~]#

Code:

[root@Aquanox ~]# smartctl -A /dev/da1
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 2
3 Spin_Up_Time 0x0027 249 242 021 Pre-fail Always - 9508
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3716
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 068 068 000 Old_age Always - 23498
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 243
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 141
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 311912
194 Temperature_Celsius 0x0022 112 103 000 Old_age Always - 40
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 2

marbus90 · Feb 2, 2015

Looking at the LCC, your disks are dead. In case they're WD Reds, you can invoke an advanced RMA, so they'll ship new disks first, you can resilver the pool and then send the old disks back.

Ericloewe · Feb 2, 2015

marbus90 said:
Looking at the LCC, your disks are dead. In case they're WD Reds, you can invoke an advanced RMA, so they'll ship new disks first, you can resilver the pool and then send the old disks back.

The load cycle count is high, but that's not really related to what going on, most likely.

The pending sector says the drive is beginning to crap out. It's not an urgent replacement, though, so acquire a spare, burn it in and then replace the current drive.

marbus90 · Feb 2, 2015

The disks are still considered as dead due to high LCC count. Doesn't really matter that 1 drive shows a pending sector, _all_ drives will crap out soon.

Ericloewe · Feb 2, 2015

marbus90 said:
The disks are still considered as dead due to high LCC count. Doesn't really matter that 1 drive shows a pending sector, _all_ drives will crap out soon.

Past their load cycle rating, not dead.

marbus90 · Feb 2, 2015

Dead enough to warrant a RMA for WD Reds.

Ericloewe · Feb 2, 2015

marbus90 said:
Dead enough to warrant a RMA for WD Reds.

I doubt it's a Red. The LCC is too high for a Red that wasn't built around December 2013 (that's when they shipped with the 8s idle timer).
If you RMA a Green because the LCC is too high they'll most likely tell you to take a hike and buy Reds in the future. The Reds aren't supposed to get that high, so a case can be made with them that the drive is defective - but it only applies to that nasty batch from a year ago.

cyberjock · Feb 2, 2015

That disk is probably a Green.

LCC does NOT make a disk dead. LCC means that the landing zone might wear down sufficiently to where it may cause the drive to fail (which would show up on other SMART parameters too since you'd have read and/or write errors).

But attributes 1 and 200 are non-zero, so something is not quite right with the disk.

CUPS is also non-zero, so the disk is basically at that edge of starting to fail. If it's covered by warranty you should be able to do a SMART long test and see it fail. Once it fails it qualifies for an RMA.

BUT, the LCC does *not* qualify the disks for an RMA. In fact, your RMA can be denied BECAUSE you clearly used the disks in something like a server when they are sold as desktop drives.

I'm not sure why everyone is so confused about the LCC, WDidle, spindown, etc. This stuff is clearly discussed in my wdidle thread. :P

sYndax · Feb 2, 2015

Well it's a 3TB Green Drive... the warranty ended on 04/14 so no RMA there...

I was wondering should i replace it with a new 3TB or get ready for future pool enlargement.

If so, should i get a new 4TB or 6TB (red)?

cyberjock · Feb 3, 2015

I'd definitely replace the disk. You can go with a Green if you are ready to use wdidle. It takes 5 minutes to do, but if money is tight they have worked fine for me. On the other hand if you want that longer warranty and such, go for the Reds.

marbus90 · Feb 3, 2015

They wouldn't accept an RMA on the Green anyway.

Red is definitely the way to go. The disk size only depends on your storage requirements.

SirMaster · Feb 3, 2015

I often see people say that certain things would "deny" your RMA.

Is WD just more strict about this or something?

I just RMAd a Seagate desktop-class disk that the temperature had gone over the threshold (disk had hit 66C at some point) due to some failed cooling and they replaced the disk via RMA just fine. The disk was failing SMART testing though which was my reason for RMA in the first place.

cyberjock · Feb 3, 2015

From what I've seen its just a matter of random chance on whether they've caught it or not. I wouldn't be surprised if they all suddenly clamped down bigtime on this sort of thing, called it fraudulent RMAs, and then denied your RMA as a "cost saving measure". Let's be honest, if you are operating a hard drive outside it's designed criteria, should you qualify for an RMA?

In your case SirMaster, I've seen 3 disks that overheated (all were seagates.. lol) and they all failed SMART tests after cooking them at 67C for about an hour. The drives never recovered, but they were crappy 320GB so it wasn't a big loss. So it's very possible the SMART tests failing was because of the high temp unless your disks were failing before they overheated.

SirMaster · Feb 3, 2015

Yeah, I certainly didn't rule out the extreme temperature as what perhaps contributed to the disk failing earlier than normal. I just figured I'd try an RMA because what else did I really have to lose? And I guess they either didn't check it or didn't care.

I just wonder about hearing about situations where people actually got denied and how that went because I haven't really heard of this happening and I can't find much info on it from searching the web in general. It's not something people talk about too much so there just isn't just info available from what I can see.

Important Announcement for the TrueNAS Community.

Faulty hardrive?

sYndax

Dabbler

dlavigne

Guest

sYndax

Dabbler

marbus90

Guru

Ericloewe

Server Wrangler

marbus90

Guru

Ericloewe

Server Wrangler

marbus90

Guru

Ericloewe

Server Wrangler

cyberjock

Inactive Account

sYndax

Dabbler

cyberjock

Inactive Account

marbus90

Guru

SirMaster

Patron

cyberjock

Inactive Account

SirMaster

Patron

Similar threads

Important Announcement for the TrueNAS Community.

Faulty hardrive?

Dabbler

dlavigne

Guest

Dabbler

Guru

Server Wrangler

Guru

Server Wrangler

Guru

Server Wrangler

Inactive Account

Dabbler

Inactive Account

Guru

Patron

Inactive Account

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Faulty hardrive?"

Similar threads