Reliable way of locating a dead drive for replacement?

Status
Not open for further replies.

Will Dormann

Explorer
Joined
Feb 10, 2015
Messages
61
Hi folks,

I've reviewed https://forums.freenas.org/index.ph...ed-drive-if-you-have-an-lsi-controller.30823/ and it appeared to be viable at the time that I tested it. However, I recently attempted to use that technique to find a drive that wasn't responsive (some level of dead). The problem I ran into was that the sas2ircu command wasn't able to identify the drive because it wasn't responsive. After a long delay, it reported with:

# sas2ircu 0 locate 2:0 on
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.

SAS2IRCU: Drive specified by 2:0 is not available.
SAS2IRCU: Error executing command LOCATE.

After replacing the drive, the exact same command above worked just fine.

So the problem here is that one might want to identify a dead drive to know which bay to pull. But ironically, the command used doesn't appear to work on dead drives. In the end, I identified the drive by its lack of activity when the FS was being stressed. But I'm a little uncomfortable with this as a reliable way of identifying a dead drive.

What is the recommended way for locating a dead drive? I suspect that the answer may depend on the hardware being used.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The only two things you can be sure are the GPTID (as you can see it when you do a zpool status for example) and the drive's serial number. The device label (/dev/daX...) is reliable only if you don't reboot (it can change from reboot to reboot).

I made a script to display a table that map the infos, look at the link "Useful Scripts" in my signature if you're interested ;)
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
In the end, I identified the drive by its lack of activity when the FS was being stressed.
This is what I use. If the drive responds, I'll run a dd if=/dev/failingda10 of=/dev/null bs=1024k count=10000000 If it doesn't respond (and is mostly dead like yours), I'll write to the pool and look for inactivity. dd if=/dev/zero of=/mnt/tank/somefolder count=100000000000
 

Will Dormann

Explorer
Joined
Feb 10, 2015
Messages
61
This is what I use. If the drive responds, I'll run a dd if=/dev/failingda10 of=/dev/null bs=1024k count=10000000 If it doesn't respond (and is mostly dead like yours), I'll write to the pool and look for inactivity. dd if=/dev/zero of=/mnt/tank/somefolder count=100000000000


That's not too dissimilar to what I ended up doing. Is this the state of the practice for locating a physical drive, though? In this case, the enclosure in question was being fully dedicated to a single zpool, so finding the inactive drive was pretty easy. However, the other enclosure hosts multiple zpools, and beyond that they're zvol-based as well. So both triggering the activity from a terminal as well as simply noticing which drive activity light is idle seems to be impractical.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
If I were better organized, I'd create a spreadsheet and slap some labels on the drive carriers. But I'm not, so this is what I resort to. (Others are much better organized and have said spreadsheets and stickers).

As for usage on large pools, I can see where that could become a challenge. Even the zvol-based devices live on a pool, and that pool has a dataset that you can write to via the CLI. If you have a general sense of which drives are part of what pool, it could work. I have 3 pools with ~35+ drives and I use that method.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Depending on your hardware, you may be able to manually issue commands via sas2ircu for the backplane to mark the drive as failed.
 
Status
Not open for further replies.
Top