Problems after update to 11.3-Stable

Kevo

Dabbler
Joined
Jan 1, 2019
Messages
37
I updated on the 2nd of Feb and thought everything was fine. This morning I get an email that my main pool is degraded. So after investigating I see that there are 24 chksum errors on the pool and both drives in the mirror show 48. I had to get this from the command line as the GUI is not showing the errors. It shows Feb 2 as the last scrub with 0 errors. It does show the degraded status on the mirror, but nothing else useful.

So then I go to look at my periodic snapshots and it shows no snapshots have happened since the 2nd. Also my replication to my backup pool has also not happened presumably since it's based on the periodic snapshots.

I think my errors on the main pool are not a problem yet since it only listed an rrd file as possibly damaged. However, I am not sure what to do at this stage.

There doesn't appear to be a way to deal with the degraded pool in the GUI. IS it safe to just clear the pool in the command line and scrub it again?

Also any ideas what to do about the snapshots not running? Should I just recreate things. I would be inclined to go back to 11.2U7, but I already upgraded the pools so I think that's not an option?
 
Last edited:

Kevo

Dabbler
Joined
Jan 1, 2019
Messages
37
So I did a scrub on the pool and I apparently only have 3 files with unrecoverable errors. I rm'd all of them except I have a file listed strangely and I can't actually find such a file. I'm guessing the way it's named has some meaning I don't know. Does anyone know what the :<0x17c> means and can I get rid of this somehow.

tank/.system/rrd-c0708eef149b49a396f66530a6f1b80a:<0x17c>

I plan to scrub this pool again tonight to see if anything else turns up. If it does then I'm going to guess I really do have a drive or hardware problem and need to get that resolved. Currently I believe the problem started with a power outage that happened while I was away. I don't believe the server shut down as it's on a UPS, but our neighbor said there were some unusual power swings for several minutes so who knows what that did.

TIA
 

Kevo

Dabbler
Joined
Jan 1, 2019
Messages
37
So after a reboot and another scrub the previous error is gone but I had another file that was supposedly corrupted. It was a backup file and not critical so I deleted it and replaced it. After that I ended up with another status listing with the :<0x???> type prefix, so I think what that means is just that the file is no longer there.

There have been no errors since the reboot and scrub, so whatever happened, possibly during the power outage, it appears the reboot has corrected the issue.
 
Top