How to identify the correct failed USB boot drive

Status
Not open for further replies.

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
No need to do anything for replace it, just follow the instructions on the manual, but 1st, save your config, just in case you need to reinstall.

Edit: Save your config every week :)
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
I know I'm a bit late to the party, but the Sandisk Ultra Fit USB drives will blink their LED if they've booted out of the pool and gone off-line, making it easy to identify.
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
melloa: I save my config every time it changes... have a script to check the checksum that runs hourly, if it changes, it copies it to the storage and takes a snapshot. :D

m0nkey: good point - didn't realize it had an access light. At that point it becomes really easy - fail the drive out of the pool, then use dd to write a stream of data to it, and look for the access light. Won't be completely steady but should be distinctive.
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Last edited:

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
I was too slow to reply: https://github.com/ChrisHeerschap/freenas-scripts

Actually, that's outdated, I stopped using the snapclean script as I run automatic snapshots on that dataset for replication now. Here's the current version:

Code:
#!/bin/bash

#
# configbak
#
# 131120 cmh (cmh@db94.net)
#
# Purpose:	  Take a backup of the FreeNAS config file
#

exec 2> /mnt/sto/tmp/debug-configbak
set -x

# var defs
DB=freenas-v1.db
ACTIVE=/data/$DB
BAKSET=sto/backup/config
BAK=/mnt/$BAKSET/$DB

# Make sure I can read the database
if [[ ! -r $ACTIVE || ! -w $BAK ]]
then
	echo "Error: can't read $ACTIVE or write $BAK"
	exit 1
fi

# Check the active version
actsum=$(cksum $ACTIVE)
actsum=${actsum%% *}

# Check the backup version
#  Direct STDERR to null in case the file doesn't exist
baksum=$(cksum $BAK 2> /dev/null)
baksum=${baksum%% *}

# See if the checksum changed
if [[ $actsum != $baksum ]]
then
		# Check the validity of the active before copying
		out=$(/usr/local/bin/sqlite3 $ACTIVE 'pragma integrity_check')
		if [[ $out != "ok" ]]
		then
				echo -e "Database integrity check failed:\n$out" | mail -s "ALERT: FreeNAS database not ok" root
				exit 1
		fi

		# Overwrite the backup
		cp $ACTIVE $BAK

fi


Not as portable as it could be as you have to update the target zpool/dataset name. Works like a champ, though, and with replication running to a remote system, I can grab the config file before I restore the OS.

So the resilver finished but despite now being a mirror of a 16G and 32G USB drive, the root zpool is still 7.4GB. Huh. Thought it would have expanded to the 16G size, but I guess I would need to replace that one as well for that to happen.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Hey, @joeschmuck, did any of the above changed? In other words, any way to identify the failed USB to be replaced?
Not that I'm aware of except for the posting by @m0nkey_ above (I didn't know that either). I would have expected the LED to mean there was activity going on.

Of course I have to ask, how often do you change your FreeNAS configuration? I change mine maybe twice every 6 months and that is a big maybe. When I do make a change I manually create a config file backup.
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Of course I have to ask, how often do you change your FreeNAS configuration? I change mine maybe twice every 6 months and that is a big maybe. When I do make a change I manually create a config file backup.

Agree that configurations are not changed very often, so changes in users, cifs, etc, can be captured when done, but I like to keep a fresh copy just in case the one I made 6 months ago got corrupted. Never though on create a cron job to run a command or a script before @cmh note above, an oversight on me for sure.

What @cyberjock proposed works in my case, as it creates a copy every x days.

One thing I did noticed on another appliance I use - pf - it automatically creates a copy of the config. One day I was exploring the directories and found several of them and it did save my life once :)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
That's what it was there for!

hmmm ... just thinking ... should we sugest @jkh something like that or already exists/in the works? We could have a dataset created inside the volume and FreeNAS would copy every night and/or before each update the config file to it...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
There is already some amount of this. FreeNAS copies the config file to /var/log somewhere (exact location escapes me). But that location is recommended to be on the boot device, so if your boot device fails, you'd lose all copies. We discussed zpool backups and such, but that was shot down as it's generally considered to be the server admin's responsibility to do backups, not the OS itself.
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
Of course I have to ask, how often do you change your FreeNAS configuration? I change mine maybe twice every 6 months and that is a big maybe. When I do make a change I manually create a config file backup.

It's been a while since I set up my automatic backups, but I was surprised to see how often the config file itself changes, even if I'm not making major changes to the config on the system. Little things here and there.
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
but that was shot down as it's generally considered to be the server admin's responsibility to do backups, not the OS itself.

Philosophy differences, but resolved with your post plus other upgrades :)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
It's been a while since I set up my automatic backups, but I was surprised to see how often the config file itself changes, even if I'm not making major changes to the config on the system. Little things here and there.
For me, I just upgraded my backup FreeNAS system to 9.10.2 yesterday and once that was completed I created a new configuration backup. Now I didn't really make any changes to my configuration however with a newer OS, I figure it's the safe and smart thing to do.
 

RickSRose

Cadet
Joined
Jun 26, 2014
Messages
1
I set up a 5-way mirror on indential Kingston DataTraveler 8Gb USB sticks.
One of them failed. I figured out which one. Tried to dd the thing and all I get are errors. Now I can't dd it at all... Is there any way to force dd to work, or is there any other cam command to run that might fix it?


-- nohup dd if=/dev/zero of=/dev/da3 bs=98304 count=78899 conv=notrunc,noerror >dd_da3.log 2>dd_da3.err &
-- nohup dd if=/dev/zero of=/dev/da3 conv=noerror,notrunc >dd_da3.log 2>dd_da3.err &
-- dd: /dev/da3: Input/output error
-- 1+0 records in
-- 0+0 records out
-- 0 bytes transferred in 8.426213 secs (0 bytes/sec)
-- nohup dd if=/dev/zero of=/dev/da3 conv=noerror,notrunc
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Cruzer Fit 3.0 drives definately have an activity light.

When trying to find a failed drive, I dd the other one...
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
Five way mirror? Any particular reason? I thought a three-way mirror was overkill, but yikes.

One thing I notice is your dd is writing to the card. Generally when I'm trying to identify failed drives I read. If it's already failing, writing might not do it any favors. The block sizes and count seem oddly arbitrary, and I'm not sure why the nohup.

Also, as Stux says, doing a dd on the still-good one is a good approach, in which case you DEFINITELY do not want to do a writing dd as you'll now have two failed drives. Then again, a five way mirror would still have three left. ;)

I set up a 5-way mirror on indential Kingston DataTraveler 8Gb USB sticks.
One of them failed. I figured out which one. Tried to dd the thing and all I get are errors. Now I can't dd it at all... Is there any way to force dd to work, or is there any other cam command to run that might fix it?


-- nohup dd if=/dev/zero of=/dev/da3 bs=98304 count=78899 conv=notrunc,noerror >dd_da3.log 2>dd_da3.err &
-- nohup dd if=/dev/zero of=/dev/da3 conv=noerror,notrunc >dd_da3.log 2>dd_da3.err &
-- dd: /dev/da3: Input/output error
-- 1+0 records in
-- 0+0 records out
-- 0 bytes transferred in 8.426213 secs (0 bytes/sec)
-- nohup dd if=/dev/zero of=/dev/da3 conv=noerror,notrunc
 
Status
Not open for further replies.
Top