Critical error with boot volume after update

Status
Not open for further replies.

ser_rhaegar

Patron
Joined
Feb 2, 2014
Messages
358
When you reboot, the error stats are cleared but the corruption the errors caused and the cause of the errors is not. Chances are the files corrupted were not critical to running the system if it still runs, but they could cause problems for you unexpectedly. I would replace the boot drives for those two sites ASAP.
 

PeterSM

Dabbler
Joined
Dec 21, 2014
Messages
30
I have a similar issue with the USB boot drive. If I install the latest version of freenas to a fresh USB boot device will I have to reinstall all the jails?

Current Version: FreeNAS-9.3-STABLE-201412090314
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
I have a similar issue with the USB boot drive. If I install the latest version of freenas to a fresh USB boot device will I have to reinstall all the jails?

Current Version: FreeNAS-9.3-STABLE-201412090314

If you backup the config, everything should be as you left it.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,458
Download your existing config file, do a clean install to a new device, upload the config. All your jails/plugins/users/shares/etc. should remain as they were.
 

PeterSM

Dabbler
Joined
Dec 21, 2014
Messages
30
Sorry if it was a stupid question but thanks for all the help and directions.
 

JoeB

Contributor
Joined
Oct 16, 2014
Messages
121
This has just happened to me. "The boot volume state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.", and "New updates avalible" in the alerts section.

I run a verify install and this is what it showed:

The following Inconsistencies were found in your Current Install:

List of Checksum Mismatches:

/usr/local/lib/perl5/5.16/man/whatis
List of Permission Errors:

/bin/pgrepExpected MODE: 0555, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/bin/pkillExpected MODE: 0555, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libalias.so.7Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libavl.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libbegemot.so.4Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libbsdxml.so.4Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libc.so.7Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libcam.so.6Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libcrypt.so.5Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libcrypto.so.6Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libctf.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libdevstat.so.7Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libdtrace.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libedit.so.7Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libgcc_s.so.1Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libgeom.so.5Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libipsec.so.4Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libjail.so.1Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libkiconv.so.4Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libkvm.so.5Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libm.so.5Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libmd.so.5Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libncurses.so.8Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libncursesw.so.8Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libnvpair.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libpcap.so.8Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libreadline.so.8Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libsbuf.so.6Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libssp.so.0Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libthr.so.3Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libufs.so.6Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libulog.so.0Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libumem.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libutil.so.9Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libuutil.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libz.so.6Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libzfs.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libzfs_core.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/lib/libzpool.so.2Expected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/libexec/ld-elf.so.1Expected MODE: 0555, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/usr/local/bin/perlExpected MODE: 0755, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/usr/local/bin/perl5Expected MODE: 0755, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/usr/local/bin/perl5.16.3Expected MODE: 0755, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/usr/sbin/mailwrapperExpected MODE: 0555, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/usr/sbin/nologinExpected MODE: 0555, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/usr/sbin/rmtExpected MODE: 0555, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001
/usr/share/misc/termcapExpected MODE: 0444, Got: 0775
Expected UID: 0, Got: 1001
Expected GID: 0, Got: 1001

So... Backup the config and reinstall on a new usb drive?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526

KitDavis

Dabbler
Joined
Jul 16, 2011
Messages
18
I'm starting to question these errors. I have 6 FreeNAS servers all at 9.3 running on mostly supermicro X9S hardware with plenty of ECC memory. All have been in service for at least 6 months and some have been working for a couple of years. All were upgraded to 9.3 over the last couple of months. Most of them had been using older 4GB USB drives as boot devices that had been working well for since initially installed, so I replaced all of them with 8GB USB drives from a few different manufacturers. (mostly Kingston). Over the last two months, all but one of the new USB drives has had the critical error. (The boot status will show that there are multiple checksum errors) After replacing all of the 5 of the USB drives and having 2 of the replacement drives go bad, I decided to replace the USB drives with SSD drives. I had a number of Intel X25 40GB drives that were purchased for another project that were either new or very lightly used. To date, I have used these drives as the boot volume in 4 of the FNAS servers. Over the last week, 3 of these servers have reported the critical error with these drives and have been replaced. Last night, 2 of the servers reported critical errors with these new boot drives that had been replaced just a week ago. This means that in the last month, I have had 8 new USB drives, and 5 SSD drives develop critical errors. I know there is a thing called bad luck, but this seems too much of a bad thing. When the USB drives went "bad" I just tossed them, but this morning, I took 3 of the "bad" SSD drives and used the Intel SSD Tool Box to run a full diagnostic scan which reported zero problems with the drives. Maybe I just have a large batch of bad hardware, but it seems to me that the failure rate I am experiencing is just too high for this to be the cause.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
Sounds pretty logical to me. The problem will be reproducing the issue so it can be addressed.

Having multiple servers and experiencing the same thing when installed on ssd is a new development that is harder to dismiss out of hand. It's pretty easy to say 'bad USB' or 'works for me'.

It would be nice if this was isolated. But we seem to see rolling update issues and bad boot devices all over. Sure there are some cheap USB sticks that cause problems. I also think it probable there is more going on.

Truthfully these sorts of things keep me on 9.2 for anything not disposable. These are the kinds of challenges that take a lot of time to fix, imho. The good news is pool data is segregated well enough we have never seen a problem. Thanks for the write up.
 

KitDavis

Dabbler
Joined
Jul 16, 2011
Messages
18
Yes, reproducing it is a bit of a challenge. Most of the boxes are production so I don't have a lot of time to spend on the error when it occurs. The second (humorous) problem, is I do have a test NAS box for just this purpose. It has everything that a good Freenas box should not have - a low power AMD processor, RealTek network card, 8GB of non ECC memory and a $6 walmart el-cheapo USB stick as the boot drive. Of course, of all the boxes, it is the only one that has not experienced the critical boot drive error. It receives rsync data from another box so it gets a good bit of activity but it just keeps chugging along. ;-) I ordered a new batch of Intel SSD drives yesterday and when they arrive I will start swapping them in and see if they make a difference. I replaced the SSD drives in the two boxes that had the problem last night with some old 80GB SATA drives while I try and figure out the issue.
 

geekmiki

Dabbler
Joined
Mar 10, 2015
Messages
13
Same issue here, I'm using a brand new Corsair Voyager 16GB stick and I got the critical error message after applying an update.

Code:
pool: freenas-boot                                                        
state: ONLINE                                                              
status: One or more devices has experienced an unrecoverable error.  An     
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors  
        using 'zpool clear' or replace the device with 'zpool replace'.     
   see: http://illumos.org/msg/ZFS-8000-9P                                  
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue May 12 10:54:16 2015  
config:                                                                     
                                                                            
        NAME                                          STATE     READ WRITE CKSUM
        freenas-boot                                  ONLINE       0     0     1
          gptid/52a97d73-ef05-11e4-8ff2-0cc47a4057da  ONLINE       0     0     5
                                                                            
errors: No known data errors                 


I'll gladly replace the USB drive if it really is a USB drive issue... But there seems to be some confusion here as many drives are failing under the same conditions.
The difference between my zpool status and the OP's is that I have "No known data errors" reported.

EDIT: My "Verify install" reports that there is a checksum mismatch on resolv.conf, which is a known bug (https://bugs.freenas.org/issues/8692).
Is the zpool status error message related to this? In this case is it safe to assume that my boot device is ok?
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Same issue here, I'm using a brand new Corsair Voyager 16GB stick and I got the critical error message after applying an update.

Code:
pool: freenas-boot                                                       
state: ONLINE                                                             
status: One or more devices has experienced an unrecoverable error.  An    
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors 
        using 'zpool clear' or replace the device with 'zpool replace'.    
   see: http://illumos.org/msg/ZFS-8000-9P                                 
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue May 12 10:54:16 2015 
config:                                                                    
                                                                           
        NAME                                          STATE     READ WRITE CKSUM
        freenas-boot                                  ONLINE       0     0     1
          gptid/52a97d73-ef05-11e4-8ff2-0cc47a4057da  ONLINE       0     0     5
                                                                           
errors: No known data errors                 


I'll gladly replace the USB drive if it really is a USB drive issue... But there seems to be some confusion here as many drives are failing under the same conditions.
The difference between my zpool status and the OP's is that I have "No known data errors" reported.

EDIT: My "Verify install" reports that there is a checksum mismatch on resolv.conf, which is a known bug (https://bugs.freenas.org/issues/8692).
Is the zpool status error message related to this? In this case is it safe to assume that my boot device is ok?


No, ZFS detects block-level corruption. Verify Install just compares files with their signatures from the install files - this process is running on files that are not expected to be immutable and is not related to ZFS-level errors.
 
Status
Not open for further replies.
Top