Unrecoverable error in freenas-boot pool?

Status
Not open for further replies.

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Hello-
I woke up today to an unpleasant email:

Code:
Checking status of zfs pools:
NAME           SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank      16.2T  3.40T  12.8T         -      -    20%  1.00x  ONLINE  /mnt
freenas-boot  19.9G   481M  19.4G         -      -     2%  1.00x  ONLINE  -

  pool: freenas-boot
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 36K in 0h0m with 0 errors on Tue Sep 22 15:10:17 2015
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     1     0
          da0p2     ONLINE       0     1     0

errors: No known data errors


I know very little about the freenas-boot pool, but I'm pretty sure it used to have da0p1 in it - which still exists if I 'ls /dev'. As I understand it, both of these partitions are on the same drive da0, so I'm not even sure how this failure is able to happen. What are the right next steps here for me to take?

Thanks in advance!
 

DaveF81

Explorer
Joined
Jan 28, 2014
Messages
56
Anything in your syslog to indicate an issue? Last time I had this myself I found my USB boot drive was failing and needed to be replaced.
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
I found some more log data (below), FreeNAS is itself running in a VM on an ESXi 5.5 host, the boot disk is a virtual disk (backed by mirrored SSDs), and the tank pool is attached to a LSI HBA passed through to the VM. I did a bit of Googling and it seems a few people have seen things like the below happen on virtual disks, it is a first for us since this has been bulletproof for nearly two years. No obvious fix suggestions that I could find other than disabling MSI/MSIX - but I want this on for the actual hardware LSI card. Is there an easy way to just add this back to the pool and keep an eye on it?

mpt0: request 0xffffff8000ad36c0:24994 timed out for ccb 0xfffffe0005c43000 (req->ccb 0xfffffe0005c43000)
mpt0: attempting to abort req 0xffffff8000ad36c0:24994 function 0
mpt0: completing timedout/aborted req 0xffffff8000ad36c0:24994
(da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 00 52 b4 c2 00 01 00 00
(da0:mpt0:0:0:0): CAM status: Command timeout
(da0:mpt0:0:0:0): Retrying command
mpt0: abort of req 0xffffff8000ad36c0:0 completed
mpt0: request 0xffffff8000ad5ac0:25060 timed out for ccb 0xfffffe0005c43000 (req->ccb 0xfffffe0005c43000)
mpt0: attempting to abort req 0xffffff8000ad5ac0:25060 function 0
mpt0: completing timedout/aborted req 0xffffff8000ad5ac0:25060
(da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 00 52 b4 c2 00 01 00 00
(da0:mpt0:0:0:0): CAM status: Command timeout
(da0:mpt0:0:0:0): Retrying command
mpt0: abort of req 0xffffff8000ad5ac0:0 completed
mpt0: request 0xffffff8000ad5b50:25062 timed out for ccb 0xfffffe0005c43000 (req->ccb 0xfffffe0005c43000)
mpt0: attempting to abort req 0xffffff8000ad5b50:25062 function 0
mpt0: completing timedout/aborted req 0xffffff8000ad5b50:25062
(da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 00 52 b4 c2 00 01 00 00
(da0:mpt0:0:0:0): CAM status: Command timeout
(da0:mpt0:0:0:0): Retrying command
 

David E

Contributor
Joined
Nov 1, 2013
Messages
119
Did some more investigation, I had though that there were two devices in my boot pool, but it looks like just one.. things seem to be working and nothing apparently actually died during that time, so hopefully this doesn't happen again.
 

DaveF81

Explorer
Joined
Jan 28, 2014
Messages
56
Just as I thought, boot drive is failing. You will get the same scrub failure next time. I'd suggest backing up your config and replacing the drive soon.

Edit: I missed the part regarding ESXI. It appears your boot image has become corrupt somehow. I still stand by backing up your config and performing a fresh install on a new VMDK.
 
Status
Not open for further replies.
Top