Boot Pool Degraded: Weird other stuff happening too??

winstontj

Explorer
Joined
Apr 8, 2012
Messages
56
I'm looking for a little help (please).

I'm not sure what happened but last time I logged into my TN Core machine 2FA worked (and the boot pool was fine). I tried to login yesterday to check on updates and my user/pw+2fa wasn't working. No clue what happened but all I know is 2fa used to work (my phone and her phone is #2/backup 2fa device) and I had to reset my root pw via the console to get in.

idk if it could be a bug, or could the 2fa breaking have anything to do with also losing a drive in my boot pool?

I also can no longer see my boot pool in the storage, pools gui menu. zpool status shows what I pasted below so cli clearly still sees my boot pool... Just wondering what is going on, why I don't see the degraded boot-pool in the GUI and why the 2FA broke or wasn't working. I built this machine about a year ago and it's been AWESOME, and I've run TN (and FreeNAS) before but I don't think I've ever lost a zfs drive before. I'm capable of using Google to figure it out but I'm a little worried about why the 2fa broke and also why I can not see my boot pool in the GUI. The disks menu shows the drives (all of the SMART results all say all drives are fine).

Is this a problem? What do I do to replace the bad drive (I have spare) and is it an issue that my 2fa just randomly stopped working?

Code:
Boot pool status is DEGRADED: One of more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state..
26 February, 2023 04:12:52 (America/New_York)


Code:
zpool status:
errors: No known data errors

  pool: boot-pool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 0B in 00:00:44 with 0 errors on Sun Mar 12 05:13:46 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            ada1p2  ONLINE       0     0     0
            ada0p2  FAULTED     43   599   454  too many errors

errors: No known data errors


What other info can I provide?
Thanks.
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
I also can no longer see my boot pool in the storage, pools gui menu.
The boot pool is not displayed there.
It is in System > Boot.
Manage it there and replace your defective device.
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
The 2FA could be as simple as a drifting clock. That's what I'd check first
 

winstontj

Explorer
Joined
Apr 8, 2012
Messages
56
The boot pool is not displayed there.
It is in System > Boot.
Manage it there and replace your defective device.
Thanks. Do I have to reboot to replace the disk? (meaning power off, replace disk, then power back on) Do I select the "detach" selection on the three little dots on the right? Or do I select "replace"?

Or is it as easy as power down the machine, swap in a new disk, then power on, log back in, select the new disk and choose "replace" in the boot pool? (new disk is identical size, brand, firmware, type, etc. of failed disk)

The 2FA could be as simple as a drifting clock. That's what I'd check first
I don't think it is possible to control iphone time settings so that would be my only variable. When I built this tn machine I did a bunch of other lab/network work including make sure time was sync'd properly across all of my devices on the network. Just checked and as far as I can tell, bios, ipmi and tn-core are all sync'd and updating properly.
Are there any better options other than 2fa? I thought I'd login to the tn machine more but honestly --it just works... To the point that I need to get alerts & notifications working because I had no idea that I had a degraded boot disk (pool).
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
Thanks. Do I have to reboot to replace the disk? (meaning power off, replace disk, then power back on) Do I select the "detach" selection on the three little dots on the right? Or do I select "replace"?

Or is it as easy as power down the machine, swap in a new disk, then power on, log back in, select the new disk and choose "replace" in the boot pool? (new disk is identical size, brand, firmware, type, etc. of failed disk)
I would detach the failed drive, then if you have hot plug put in the new one and do an Attach. Or power down between steps if necessary.
Check your BIOS Boot settings to make sure you boot from the good boot drive.
 

winstontj

Explorer
Joined
Apr 8, 2012
Messages
56
I would detach the failed drive, then if you have hot plug put in the new one and do an Attach. Or power down between steps if necessary.
Check your BIOS Boot settings to make sure you boot from the good boot drive.
Thanks a bunch. It was literally that easy: I did a "detach" then "attach". I assume "replace" would possibly be easier/simpler?
Still haven't sorted the 2fa thing. It isn't clock drift (I'm pretty sure/positive clocks are fine).
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
I'd double check your assumptions re: drift, the availability of NTP servers (your ISP may be blocking access) or the offsets being so large that the NTP doesn't correct them. I had some serious issues here when I set my NAS to get its time from the router with all sorts of terrible performance issues, etc. Only when I I stopped that nonsense and went back to the usual default pool (freebsd?) did things improve dramatically. I since invested in 2 Stratum 1 servers here, along with multiple RTC-based Stratum-2 clocks to avoid this issue going forward. I highly recommend the NTP200 from centerclick for this purpose.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I highly recommend the NTP200 from centerclick for this purpose.
Thank you very much for that hint. I just ordered one unit. The official publicly run german time source ("Physikalisch-Technische Bundesanstalt Braunschweig") explicitly encourages direct use of their three stratum 1 servers as long as you poll them with a single or a couple of devices and redistribute in your own network.

So that combined with a GPS clock will do nicely I figure.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
It’s a great value, and the technical support is first rate also. I use my pi-holes and similar raspberry pis as my backup NTP servers.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I investigated using a Pi and a USB module but did not get to put in the effort. That turnkey solution looks great.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
These days, the NTP200 is also considerably less expensive than a pi, a Uputronics hat, enclosure, etc. All in all, a better solution at a lower cost.
 

winstontj

Explorer
Joined
Apr 8, 2012
Messages
56
These days, the NTP200 is also considerably less expensive than a pi, a Uputronics hat, enclosure, etc. All in all, a better solution at a lower cost.
Y'all just blew my mind with that NTP200 and that NTP drift was that big of a thing.

I think I just got caught with my pants down a bit... I used to work in the low-latency space too... (15 years ago)...

Should I buy something that tells pfsense what time it is? I've been pointing everything to pfsense and then telling pfsense to go figure out what time it is.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
That was the error of my way too. Though I think it was a edgerouter 5 at the time. Get that NTP200 or rely on the FreeBSD pool. But if it’s the latter, make sure that your ISP Is not blocking access to it. Also, if you have Pi’s on your network, then making them backup NTPs is trivial and they can still do everything else w/o worry.
 

winstontj

Explorer
Joined
Apr 8, 2012
Messages
56
That was the error of my way too. Though I think it was a edgerouter 5 at the time. Get that NTP200 or rely on the FreeBSD pool. But if it’s the latter, make sure that your ISP Is not blocking access to it. Also, if you have Pi’s on your network, then making them backup NTPs is trivial and they can still do everything else w/o worry.

I'd like to cut down on outside communication as much as possible. Lower traffic in general --much better to have a single (or with failover, two, three, four, etc.) internal device that can handle time... Vs. every single device going out and looking at its watch. Some stuff: (cheap security cameras that allow kim jong-un to live-stream our bedroom) have firmware that likes to think its talking to (insert random fqdn/ip address) every n minutes. Those IoT things are all devices I'd rather not allow to pass through our fw at all --ever...

I'd like to keep them on separate networks and either don't let them or trick them into thinking they are getting what they want.

So what's the difference between a ntp200 and (some other device) to push (know) time? Especially when I can't change where some places want to ask what time it is? (cheap chinese stuff sometimes you can't change ntp url/fqdn)
 
Top