Replaced drive removed during resilver, not sure what to do next.

Status
Not open for further replies.
Joined
Oct 10, 2016
Messages
21
This was a replacement RMA drive from Western Digital. Just came in the mail today. I replaced ada5 with it, and it was in the resilver process for about 2-3 hours. I received the following via email alert:
Code:
Device: /dev/ada5, unable to open device
Device: /dev/ada8, 3 Currently unreadable (pending) sectors
The volume Pool1 state is DEGRADED: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.


When I log into the FreeNAS gui it shows ada5 as removed now. Here's a small snippet from /var/log/messages:
Code:
Jan 10 20:36:02 freenas (ada5:ahcich12:0:0:0): WRITE_DMA48. ACB: 35 00 d0 96 5d 40 1c 01 00 00 08 00
Jan 10 20:36:02 freenas (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:36:02 freenas (ada5:ahcich12:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 10 (IDNF )
Jan 10 20:36:02 freenas (ada5:ahcich12:0:0:0): RES: 51 10 d0 96 5d 40 1c 01 00 08 00
Jan 10 20:36:02 freenas (ada5:ahcich12:0:0:0): Error 5, Retries exhausted
Jan 10 20:36:09 freenas (ada5:ahcich12:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 f8 e1 41 40 1c 01 00 00 00 00
Jan 10 20:36:09 freenas (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:36:09 freenas (ada5:ahcich12:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan 10 20:36:09 freenas (ada5:ahcich12:0:0:0): RES: 41 10 f8 e1 41 40 1c 01 00 00 00
Jan 10 20:36:09 freenas (ada5:ahcich12:0:0:0): Retrying command
Jan 10 20:36:15 freenas daemon[3641]:	 2018/01/10 20:36:15 [WARN] agent: Check 'service:nas-health' is now warning
Jan 10 20:36:16 freenas (ada5:ahcich12:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 f8 e1 41 40 1c 01 00 00 00 00
Jan 10 20:36:16 freenas (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:36:16 freenas (ada5:ahcich12:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan 10 20:36:16 freenas (ada5:ahcich12:0:0:0): RES: 41 10 f8 e1 41 40 1c 01 00 00 00
Jan 10 20:36:16 freenas (ada5:ahcich12:0:0:0): Retrying command
Jan 10 20:36:23 freenas (ada5:ahcich12:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 f8 e1 41 40 1c 01 00 00 00 00
Jan 10 20:36:23 freenas (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:36:23 freenas (ada5:ahcich12:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan 10 20:36:23 freenas (ada5:ahcich12:0:0:0): RES: 41 10 f8 e1 41 40 1c 01 00 00 00
Jan 10 20:36:23 freenas (ada5:ahcich12:0:0:0): Retrying command
Jan 10 20:36:30 freenas (ada5:ahcich12:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 f8 e1 41 40 1c 01 00 00 00 00
Jan 10 20:36:30 freenas (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:36:30 freenas (ada5:ahcich12:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan 10 20:36:30 freenas (ada5:ahcich12:0:0:0): RES: 41 10 f8 e1 41 40 1c 01 00 00 00
Jan 10 20:36:30 freenas (ada5:ahcich12:0:0:0): Retrying command
Jan 10 20:36:37 freenas (ada5:ahcich12:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 f8 e1 41 40 1c 01 00 00 00 00
Jan 10 20:36:37 freenas (ada5:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:36:37 freenas (ada5:ahcich12:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Jan 10 20:36:37 freenas (ada5:ahcich12:0:0:0): RES: 41 10 f8 e1 41 40 1c 01 00 00 00
Jan 10 20:36:37 freenas (ada5:ahcich12:0:0:0): Error 5, Retries exhausted
Jan 10 20:37:09 freenas ahcich12: Timeout on slot 9 port 0
Jan 10 20:37:09 freenas ahcich12: is 00000000 cs 00000200 ss 00000000 rs 00000200 tfd c0 serr 00000000 cmd 0004c917
Jan 10 20:37:09 freenas (ada5:ahcich12:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jan 10 20:37:09 freenas (ada5:ahcich12:0:0:0): CAM status: Command timeout
Jan 10 20:37:09 freenas (ada5:ahcich12:0:0:0): Retrying command
Jan 10 20:37:41 freenas ahcich12: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Jan 10 20:38:04 freenas ada5 at ahcich12 bus 0 scbus12 target 0 lun 0
Jan 10 20:38:04 freenas ada5: <WDC WD40EFRX-68WT0N0 82.00A82> s/n WD-WCC4E2LHA684 detached
Jan 10 20:38:04 freenas ZFS: vdev state changed, pool_guid=221229000499233486 vdev_guid=7158956947314190400
Jan 10 20:38:04 freenas GEOM_MIRROR: Device swap3: provider ada5p1 disconnected.
Jan 10 20:38:04 freenas (ada5:ahcich12:0:0:0): Periph destroyed
Jan 10 20:38:05 freenas ZFS: vdev state changed, pool_guid=221229000499233486 vdev_guid=7158956947314190400
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): Error 5, Retries exhausted
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): CAM status: ATA Status Error
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
Jan 10 20:38:09 freenas (aprobe0:ahcich12:0:0:0): Error 5, Retries exhausted
Jan 10 20:38:33 freenas daemon[3641]:	 2018/01/10 20:38:33 [WARN] agent: Check 'service:nas-health' is now warning


Resilver is still progressing. It went from 8% to 10% just in the couple minutes it took me to create this post.

Does this mean the replacement drive I received from WD is bad?
What should I do next?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
It looks that way, it happens sometimes. That is why we suggest doing brunin testing on drives before they are added to the pool.

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

The other thing we usually suggest is having a spare drive on standby that has already passed the burnin testing.

At this point, I would wait for it to finish the resilver. Then you can do another replace without much risk.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Joined
Oct 10, 2016
Messages
21
Thank you, Chris.
I do have a spare that I've been cycling through a couple RMAs now. WD won't let me do an advanced RMA because my billing and shipping addresses don't match. I ship to my work address, can't ship home. So I'm stuck waiting on shipping there and back for each RMA.

It's a WD Red 4TB. The drives are configured as RAID-Z2.

@rs225
Code:
root@freenas:~ # smartctl -a /dev/ada5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/ada5: Unable to detect device type
Please specify device type with the -d option.

Use smartctl -h to get a usage summary

 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
That looks like the drive either died catastrophically, which is rare but possible, or there is a loose connection. You might want to do a graceful shutdown, reseat all the connections and boot it back up. If it is a loose connection, it should automatically restart the resilver.
If it doesn't, you will need to replace the drive again. Sorry.
 
Joined
May 10, 2017
Messages
838
It looks that way, it happens sometimes. That is why we suggest doing brunin testing on drives before they are added to the pool.

Especially on refurbished drives, in my experience there's a high likelihood of a refurbished drive failing in the first month of use.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Especially on refurbished drives, in my experience there's a high likelihood of a refurbished drive failing in the first month of use.
I have gotten a few replacements that looked like they were factory fresh but I have gotten some that looked recycled. It just depends on things.
I have heard that if you buy one of these:
https://www.newegg.com/Product/Product.aspx?Item=N82E16822235158
and shuck the drive, it is likely to be a WD Red drive. You might even be able to pick one up in a local electronics store.
 
Joined
Oct 10, 2016
Messages
21
Resilver completed. Should I pull the drive and attempt the burn in procedure or power down the FreeNAS and reseat the drive?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I have gotten a few replacements that looked like they were factory fresh but I have gotten some that looked recycled. It just depends on things.
I have heard that if you buy one of these:
https://www.newegg.com/Product/Product.aspx?Item=N82E16822235158
and shuck the drive, it is likely to be a WD Red drive. You might even be able to pick one up in a local electronics store.
The my books are not the ones people shuck for reds. It's the easystores that have wd reds in them.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Resilver completed. Should I pull the drive and attempt the burn in procedure or power down the FreeNAS and reseat the drive?
The last status you told us about, the NAS was showing the drive as being offline / missing. What is shown in the GUI? What is the zpool status?
 
Joined
Oct 10, 2016
Messages
21
GUI shows a random string of characters on the left where it would normally say ada5. On the far left in the status column it says REMOVED.
Here's the output of zpool status:
Code:
root@freenas:~ # zpool status
  pool: Pool1
 state: DEGRADED
status: One or more devices has been removed by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: resilvered 11.8G in 0 days 04:49:27 with 0 errors on Thu Jan 11 01:10:24 2018
config:

	NAME											STATE	 READ WRITE CKSUM
	Pool1										   DEGRADED	 0	 0	 0
	  raidz2-0									  DEGRADED	 0	 0	 0
		gptid/fcd81d1e-89e5-11e6-a97f-d05099c14e39  ONLINE	   0	 0	 0
		gptid/a67020d9-ead2-11e7-8ea2-d05099c14e39  ONLINE	   0	 0	 0
		gptid/ff469a01-89e5-11e6-a97f-d05099c14e39  ONLINE	   0	 0	 0
		gptid/00871909-89e6-11e6-a97f-d05099c14e39  ONLINE	   0	 0	 0
		7158956947314190400						 REMOVED	  0	 0	 0  was /dev/gptid/a8e2f936-f65b-11e7-8485-d05099c14e39
		gptid/976605c5-dadf-11e7-8873-d05099c14e39  ONLINE	   0	 0	 0
		gptid/0428a91b-89e6-11e6-a97f-d05099c14e39  ONLINE	   0	 0	 0
		gptid/055d3aa5-89e6-11e6-a97f-d05099c14e39  ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:01:45 with 0 errors on Mon Jan  8 03:46:45 2018
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  ada2p2	ONLINE	   0	 0	 0

errors: No known data errors

 
Joined
Oct 10, 2016
Messages
21
I pulled the drive and attempted the burn in test but it finished as soon as I started it.
I'm not sure how accurate this is but Ubuntu disk manager shows 1177 bad sectors as soon as I plugged it in.
2Cfcofj.png

Burn in log file attached to this thread.

I appreciate your continued help with this.
 

Attachments

  • burnin-WDC_WD40EFRX-68WT0N0_WD-WCC4E2LHA684 (copy).txt
    20.1 KB · Views: 392

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I pulled the drive and attempted the burn in test but it finished as soon as I started it.
I'm not sure how accurate this is but Ubuntu disk manager shows 1177 bad sectors as soon as I plugged it in.
2Cfcofj.png

Burn in log file attached to this thread.

I appreciate your continued help with this.
Yeah, that's a bad thing. Send that back.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 
Joined
Oct 10, 2016
Messages
21
Yeah, that's a bad thing. Send that back.

Ok, thank you. I called WD this morning and was able to do an advanced RMA over the phone.

When the new drive arrives and I do the burn in test, does it matter if I do it on the FreeNAS server or on my Ubuntu laptop via a USB/Sata docking station?

I image it would be faster in the FreNAS server but more convenient and less risky to do it on my laptop. Thoughts?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Ok, thank you. I called WD this morning and was able to do an advanced RMA over the phone.

When the new drive arrives and I do the burn in test, does it matter if I do it on the FreeNAS server or on my Ubuntu laptop via a USB/Sata docking station?

I image it would be faster in the FreNAS server but more convenient and less risky to do it on my laptop. Thoughts?
Most of the USB adapters I have used don't work with the SMART data. If you have one that works properly and it is USB 3.0, the speed difference should be minimal. Just be sure it has good airflow to keep the drive cool because it will get very, very hot during burn-in.
 
Joined
Oct 10, 2016
Messages
21
Most of the USB adapters I have used don't work with the SMART data.
That would probably explain why the burn in test finished as soon as it started, I had the drive connected to the USB/SATA adapter - this one. Oops. It's already packed up and in a box so I'm going to continue with the RMA. When the new drive arrives I'll definitely plug it into the FreeNAS server and run the burn in test from there.
Thanks again!
 
Status
Not open for further replies.
Top