Critical Alert: Boot Volume State DEGRADED

Status
Not open for further replies.

bmcclure937

Contributor
Joined
Jul 13, 2012
Messages
110
I received an email alert yesterday that my boot volume state is degraded:

freenas.local: Critical Alerts

The boot volume state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.

I also received this security output showing the device having errors and then being disconnected:

freenas.local kernel log messages:
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 50 30 af 00 00 2c 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 18 f4 08 00 00 01 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 18 f4 08 00 00 01 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 18 f4 08 00 00 01 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 18 f4 08 00 00 01 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 18 f4 08 00 00 01 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 29 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 29 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 29 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 29 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 29 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 32 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 32 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 32 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 32 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 1a 32 38 00 00 80 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 06 38 00 00 10 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 06 38 00 00 10 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 06 38 00 00 10 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 06 38 00 00 10 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Retrying command
> (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 06 38 00 00 10 00
> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
> (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
> (da0:umass-sim0:0:0:0): got CAM status 0x44
> (da0:umass-sim0:0:0:0): fatal error, failed to attach to device
> da0 at umass-sim0 bus 0 scbus3 target 0 lun 0
> da0: <SanDisk Cruzer Fit 1.00> s/n 4C530001121214120334 detached
> (da0:umass-sim0:0:0:0): Periph destroyed

I am running mirrored boot device (2x SanDisk 16GB USB 2.0 flash drives). I am looking at the documentation on how to replace a failed drive but wanted to check with the community for some clarification.

1. Is there any way I can repair the drive or does it need to be replaced?
2. How on earth can I figure out which USB drive is which?

Thanks all. If the USB drive is failed then I am going to ask for a replacement from SanDisk.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
1. Is there any way I can repair the drive or does it need to be replaced?
Replace it. To repair a thumb drive would be like repairing a light bulb...

2. How on earth can I figure out which USB drive is which?
First shut down your server. BEWARE of ESD! Remove one of the thumb drives and reboot.
If your instance of FreeNAS boots, the flash drive you removed is the bad one.
If the server doesn't boot, you have removed the good one, so put the good one back in.
Replace bad flash device per manual directions.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If your instance of FreeNAS boots, the flash drive you removed is the bad one.
Probably. To be on the safe side, if the machine boots, do a scrub on the boot pool.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Probably. To be on the safe side, if the machine boots, do a scrub on the boot pool.
Save a copy of your configuration to another machine first, just in case the scrub manages to fubar the remaining flash drive :eek:
 

bmcclure937

Contributor
Joined
Jul 13, 2012
Messages
110
Thanks to both of you. I have a new flash drive ordered and will work to replace when it arrives.

Now I need to decide if I want to upgrade to 9.10 after I get this resolved. Don't want my plugin jails to get all screwy.
 

bmcclure937

Contributor
Joined
Jul 13, 2012
Messages
110
Thanks again, I received the replacement USB drive last night. I got lucky on first guess and swapped out the bad USB stick... then I rebuilt the boot mirror. Resilvering was successful, I ran a scrub, ran system updates... all good to go.

Very easy now that I understand the process.
 

Ghydda

Cadet
Joined
Jan 11, 2015
Messages
8
I realize bmcclure937's problem has been resolved, but I would like to share my experience on this subject.

On two ocations my Freenas-rig has flagged my boot drive as faulty (an USB-stick) - both times it happened on the re-occurring scrub (spaced 35 days apart) just after an applying an update to Freenas.
Both times afterwards the drive passed every test I could throw at it, and subsequent scrubs on the same drive goes uneventful.

This could be all coincidental - but I will be paying extra attention next time I apply an update to the system (which is not that often, as I usually get around to do it two-three times a year).
 

bmcclure937

Contributor
Joined
Jul 13, 2012
Messages
110
Oh wow, so you are saying I could have avoided replacing the USB drive? What steps were needed to fix the faulted USB drive?
 

bmcclure937

Contributor
Joined
Jul 13, 2012
Messages
110
I realize bmcclure937's problem has been resolved, but I would like to share my experience on this subject.

On two ocations my Freenas-rig has flagged my boot drive as faulty (an USB-stick) - both times it happened on the re-occurring scrub (spaced 35 days apart) just after an applying an update to Freenas.
Both times afterwards the drive passed every test I could throw at it, and subsequent scrubs on the same drive goes uneventful.

This could be all coincidental - but I will be paying extra attention next time I apply an update to the system (which is not that often, as I usually get around to do it two-three times a year).

My damn boot drive says it is degraded again. I am doubting these USB drives are failing this quickly... so can you explain how you got yours working again and out of the degraded state?
 
Status
Not open for further replies.
Top