Critical Alerts for Boot Pool

webstyler

Explorer
Joined
May 16, 2012
Messages
56
Hello guys

We have a truenas online from 4 weeks, with a boot on external SSD drive (new) we get Critical alert

"Boot pool status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.."

From shell, with # zpool status -v we get this

------------------------------------------------------------
pool: boot-pool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:00:49 with 0 errors on Mon Apr 11 03:45:49 2022
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
da4p2 ONLINE 0 22 0

errors: No known data erros
------------------------------------------------------------

What mean?

Thanks
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
It means that da4 has got 22 write errors.
Make a backup of your config file
Run a smart test on the drive (short & long)
Consider a second boot drive as a mirror in case

Post your hardware specification as per forum rules
 

webstyler

Explorer
Joined
May 16, 2012
Messages
56
As boot we use external WD Elements SE SSD
Seems not work well on 2 different truenas..

So, anyone have suggestion for external SSD boot?

Thanks
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So, anyone have suggestion for external SSD boot?
I use Samsung T5 and T7 External SSDs with good success (no failures to date), both boot pool and system dataset on several systems.
 

webstyler

Explorer
Joined
May 16, 2012
Messages
56
I use Samsung T5 and T7 External SSDs with good success (no failures to date), both boot pool and system dataset on several systems.
Hi sretalla

Thanks for share this :)

Samsung T7 model as MU-PC500T ?

We are also check for this
Kingston DC1000B Enterprise NVMe SSD 240GB M.2 2280 + Case SSD NVMe to USB3.1
But may be not sense for a boot hd and a risk without the experience of other users
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Samsung T7 model as MU-PC500T ?
I could find the manufacturer reference number as
MU-PC500H/WW
for the one that I bought, but that may be a region thing (I'm in Europe).
 

webstyler

Explorer
Joined
May 16, 2012
Messages
56
I could find the manufacturer reference number as
MU-PC500H/WW
for the one that I bought, but that may be a region thing (I'm in Europe).
Fine, is the same, I'm also in Europe:)

MU-PC500T/WW is Titanium Grey
MU-PC500H/WW is Indigo Blue

Thanks for info
 

jlpellet

Patron
Joined
Mar 21, 2012
Messages
287
I have good luck with small generic nvme or sata ssd in external usb 3 case. 1 thing you might try is put the WD in a USB 2 port as I have had errors using a usb 3 disk in a usb 3 port that go away in a usb 2 port (mostly in generating an upgrade OS install). Note I've used ssd as small as 32G without issue. Note, in my experience an OS install takes under 2 GB so keeping 5 environments from upgrades takes less than half a 32GB disk. Good luck.
John
 

webstyler

Explorer
Joined
May 16, 2012
Messages
56
New boot installed on Samsung T7
Config restored

we get warning

Warning: Attempt to connect to netlogon share failed with error: [EFAULT] could not obtain winbind interface details: Winbind daemon is not available. could not obtain winbind domain name! failed to call wbcPingDc: Winbind daemon is not available..
2022-08-09 13:03:01 (Europe/Rome)

under services

sharing.smb.sync_registry

Status: FAILEDStart Time: 2022-08-09 13:02:44
Finished Time: 2022-08-09 13:02:44
Error: [EFAULT] net conf listshares [None] failed with error: Unable to initialize messaging context!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I doubt that's related to the specific boot media type.

I guess this is after restoring the config... do you get that error on a fresh install before the config is restored?
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Please note that (in the case of external drives), CRC errors (meaning, in this case, USB cable errors) are also reported as unrecoverable errors.
 

webstyler

Explorer
Joined
May 16, 2012
Messages
56
OMG
New boot installation on new ssd
Same error

??

different USB port, nothing change.

status check
================================
pool: boot-pool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:01:53 with 0 errors on Tue Aug 16 03:46:53 2022
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
da4p2 ONLINE 1 0 2

errors: No known data errors
================================

Any suggest ??
 

Attachments

  • ScreenShot345.jpg
    ScreenShot345.jpg
    58.9 KB · Views: 108
Last edited:

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
OMG
New boot installation on new ssd
Same error

??

different USB port, nothing change.

status check
================================
pool: boot-pool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:01:53 with 0 errors on Tue Aug 16 03:46:53 2022
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
da4p2 ONLINE 1 0 2

errors: No known data errors
================================

Any suggest ??
Either your entire USB controller loses power or something, or your external case (or cable) have a problem.
 

webstyler

Explorer
Joined
May 16, 2012
Messages
56
What are your full system specs?

Dell R710
48GB Ram
Controller Dell PERC H200
Boot: 1x500 GB SSD (samsung T7)
Storage: 4x8TB

About USB controller, we have one other system, identic for any spec, with the same issue.
The only difference is that have WD SD as boot.

So, 2 server with issue on cable or usb controller ..??

Thanks
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Dell R710
48GB Ram
Controller Dell PERC H200
Boot: 1x500 GB SSD (samsung T7)
Storage: 4x8TB

About USB controller, we have one other system, identic for any spec, with the same issue.
The only difference is that have WD SD as boot.

So, 2 server with issue on cable or usb controller ..??

Thanks
When you say WD SD, you mean sd card, or WD SSD? If you mean WD SSD, do you mean via USB also? It would not be the first time that two things go wrong at the same place and time for different reasons. Also, is your USB port the one on page 91? If not, I don't know if the other USB ports are always-on, or if they can get slowed or shut down by a power-saving feature. The only port that is made to boot from is the internal one, AFAIK.
 

webstyler

Explorer
Joined
May 16, 2012
Messages
56
When you say WD SD, you mean sd card, or WD SSD? If you mean WD SSD, do you mean via USB also? It would not be the first time that two things go wrong at the same place and time for different reasons. Also, is your USB port the one on page 91? If not, I don't know if the other USB ports are always-on, or if they can get slowed or shut down by a power-saving feature. The only port that is made to boot from is the internal one, AFAIK.
both server is identical, same hw

only difference:

Server 1
-> Boot: Western Digital SE Elements external SSD on rear USB (external)

Server2
-> Boot: Samsung T7 external SSD on rear USB (external)

In the other server we have boot on usb stick, always on external usb, and have't any issue as this in the past.

Thanks
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Your description of the situation seems to me to point to the USB to SATA converter as the likely problem.

Are you using the same device on both systems?
Have to tried any other USB to SATA converters?
Do you have access to any others to try out?

I suggest that you make a test with one of the servers - open it up and disconnect the cables to the HDD's, connect one cable (or two if the power is separate) to the SSD (in other words, eliminating the converter from the equation), restart the server and change the bios boot device selection to the just-installed SSD, reboot and see if the server boots up to TrueNAS. If it does, that should validate to you that the convertor is no good.

If that is indeed proven, go get other converter(s) to try out. One possible source of help is here in the forum - start another thread here "Need help to find USB-to-SATA converter", reference this thread, ask the forum menbers for recommendations of devices they are using, and the source. Best if you include your location (at least the country) in order the regional availability can be recognized.

If the server doesn't boot up from the directly connected SSD, remake the SSD with the install process and try again.

Good luck.
 
Last edited:

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
both server is identical, same hw

only difference:

Server 1
-> Boot: Western Digital SE Elements external SSD on rear USB (external)

Server2
-> Boot: Samsung T7 external SSD on rear USB (external)

In the other server we have boot on usb stick, always on external usb, and have't any issue as this in the past.

Thanks
As @Redcoat says, and I said earlier, in a few words, the problem is with your USB to SATA cable/case/controller/converter/thing. If you think about it, when you boot from a USB stick, the connections are:
[ Flash chips (storage) ] <->
[ stick flash controller logic <-> stick USB interface logic ] <|-|>
[ computer USB controller logic <-> chipset SATA controller logic ] etc...
But when you put an SSD through USB:
[ Flash chips (storage) ] <-> [ SSD flash interface logic <-> SSD flash controller logic <-> SSD SATA interface logic ] <|-|>
[ SATA/USB converter SATA interface logic <-> SATA/USB converter convert logic <-> SATA/USB converter USB interface logic ] <|-|>
[ computer USB controller logic <-> chipset SATA controller logic ] etc...
You have many more points of failure and an entire extra chip. Depending on the quality of the converter, you may get from slow speeds, to reconnections (can trigger errors), to momentary power failures, either of the SSD or the controller itself.
Even one disconnection or power failure is enough to make TrueNAS take your boot-pool offline.
You can try doing what I have done for my setup:
Buy a cheap (one for each computer) PCIe SATA controller (I have a 2 port)​
Buy a cheap USB to SATA converter (better in cable form)​
Open up the converter and break off the SATA-data portion (the small port)​
Modify the cover so you can fit a SATA cable next to the power portion​
Test the equipment on a Desktop to check for power failures​
If all is good, zip-tie your SSD on one of the PCIe raiser cages of the server​
Install the PCIe SATA controller underneath the SSD (on the same cage)​
Connect the former converter cable to the internal USB port of the server​
Connect the SATA to the new PCIe SATA controller​
Set the controller boot mode jumper to AHCI (if yours has a jumper)​
Boot from your, now internal, SSD​
I bought a USB to SATA cable (converter/controller, whatever) for 6€ and a PCIe SATA 6Gbps 2-port controller for 9€, so its a money-saving option.
Else get a PCIe SSD, which is a money-spending option.
 
Last edited:

webstyler

Explorer
Joined
May 16, 2012
Messages
56
Hello

Thanks for reply

Both servers have double power supplies and are deployed in a datacenter, so there aren't issue about power failure.
Both servers are setting in bios for "max power / pax performance"

We haven't free slot to put boot hd inside servers, without change hw, backplane, etc..

Also change Controller and use PCIe sata mean work a lot and put "unofficial" solutions.

May be the only way is to back to use USB stick (as sandisk cruzer fit)..

We would like to use SSD but, too issue with 2 different server and 2 different SSD brand.

Understand that cause may be usb, but other use the same Samsung T7 to usb without issue.

So think we'll be back to usb stick, if there is another solutions to but ssd to usb or out of server case.

Thanks for support and help
 
Top