Advantages of new ZFS-on-boot features (boot scrub, verify install, etc)

Status
Not open for further replies.

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I just want to put this in one place for everyone.

I was skeptical about the new ZFS-on-boot-pool feature introduced in 9.3. I thought it was.....how do I say....overkill. I was (or so I thought) totally fine with the UFS boot that we had in 9.2 and before.

Well I've had my ass saved twice that wouldn't have happened without ZFS on boot.

It turns out that whole shitloads of USB boot devices (in my case, especially some brand new Kingston Micro DT's---apparently, Kingston's continued descent into lack of reliability/repeatability continues into USB thumb drives) that were corrupting my FreeNAS operating system. I would never have known this was happening, without the new SCRUB BOOT POOL feature, which reported between 1 and 10 CKSUM errors, every time I ran it, on a brand new FreeNAS, with a brand new Kingston thumb drive, I am building for an associate.

Since I originally installed this appliance with only one boot device (I mean, hell, it's worked for us in the FreeNAS community fine for years to have only one UFS boot device), these CKSUM errors were uncorrectable.

By the time I added a mirrored device to the boot pool, I had already had a corruption situation.

How do I know that? I'm so glad you asked.

I ran the "VERIFY INSTALL" feature, again, new in 9.3, located under SYSTEM->UPDATE!!! This process did *NOT* finish, indicating corruption. (I ran that same process on a known good FreeNAS---i.e., my main FreeNAS server, that has never had a single bit of error on any data or boot device---and it finished in about 30 seconds, no errors).

So these are my recommendations:

  • Kingston's dodginess with respect to their known RAM and SSD shenanigans may be extending to their USB flash devices. As much as I never thought I'd ever say this in my life, for the time being, I am recommending against using Kingston SSDs, RAM, and DOKs (thumb drives), until they can get themselves together, or, alternately, surrender their market share to competitors that don't play this kind of game (e.g., Sandisk). I'm not sure how serious of a situation this is, but, it's not on me to figure that out. The situation is non-zero, and I'd rather not have the risk.
  • Take two identical, new, thumb drives. Make these your boot pool for FreeNAS at install time, from the first minute. I recommend 16GB.
  • Occasionally manually run boot pool scrubs (the button is located in the system->boot area).
  • Occasionally manually run "verify install"s (the button is located in the system->update area).
  • I would only keep up to about a dozen boot clones on your boot pool. Delete ones you don't need. When the device is too full to perform an update, the result, in my experience, was somewhat counterintuitive and nasty (I didn't realize my problem was a full boot device).
  • The minute you get CKSUM (or other) scrubbing errors on a boot device, replace it. These things are cheap enough that there's no excuse to run a boot pool you are not 100% confident with.
Cyberjock and I have been wondering out loud, today, how many of these weird "you say X is happening but that's friggin impossible" problems that users report are ACTUALLY SILENT CORRUPTION of the OS, or settings database, that went undetected in the pre-9.3 rubric.

If you have a single boot device, and you've already upgraded? That's cool. Buy a second device. Run a boot scrub and a "verify install" right now. When your second device arrives, put it in service mirroring your boot pool.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I think being able to check the reliability of your boot device using the features you mentioned above are great and very useful however I'd like to ask one question, not saying you have the answer...

My question is: What is the purpose, advantages, and possible disadvantages (if there are any) of using two USB boot devices?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I think being able to check the reliability of your boot device using the features you mentioned above are great and very useful however I'd like to ask one question, not saying you have the answer...

My question is: What is the purpose, advantages, and possible disadvantages (if there are any) of using two USB boot devices?
Well,

The main advantage, for my money, is that they are mirrored boot devices. i.e., if there are errors in one device, the other (plus ZFS) can repair them. With only one boot device, we can generally only detect, but not necessarily correct, errors (in fact, we won't be able to correct them, unless they happen to be in metadata blocks).

So just now, I finished having to reinstall a fresh 9.3, because my single boot pool was corrupted, with no way to fix.

That would have been unnecessary had I had a two-device boot pool out of the gate.

No?

Do you see a disadvantage? For me, it's all gravy. It's an extra $5-$10, and you get powerful redundancy.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm concerned that folks will think that having two devices would mean that if one fails, the other will keep things running smoothly in all situations. I'm thinking of an actual USB device failure where the device which boots the machine is corrupt and will not boot the machine. I don't see how if my boot device boot GRUB code has become corrupt or the flash drive just breaks, that the second USB device will jump in and save the day, meaning the system will boot. I do have every expectation that someone could remove the failed flash drive (depending on the hardware capabilities to recognize the boot device has moved) and boot your system intact.

That is why I feel it important to spell out in the user manual exactly what a second device gets you and what it doesn't, also how to recover from a failure should be included. I'm just not sure how I could test out these capabilities but I'm certain over time the users will experience these issues.

EDIT: I actually don't see any disadvantages myself.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I'm concerned that folks will think that having two devices would mean that if one fails, the other will keep things running smoothly in all situations. I'm thinking of an actual USB device failure where the device which boots the machine is corrupt and will not boot the machine. I don't see how if my boot device boot GRUB code has become corrupt or the flash drive just breaks, that the second USB device will jump in and save the day, meaning the system will boot. I do have every expectation that someone could remove the failed flash drive (depending on the hardware capabilities to recognize the boot device has moved) and boot your system intact.

That is why I feel it important to spell out in the user manual exactly what a second device gets you and what it doesn't, also how to recover from a failure should be included. I'm just not sure how I could test out these capabilities but I'm certain over time the users will experience these issues.

EDIT: I actually don't see any disadvantages myself.
I'm with you. As became abundantly clear when we cleared up the "mountroot" problem, the booting process actually has at least two important elements:

1) The first part is the pre-operating-system part, where the boot sector of the USB are read, and some amount of bootstrapping of some kind is done so that you can get to GRUB.
2) The second part, of course, is the mounting of the remaining system root as a zpool.

I don't see how having multiple devices helps you for the FIRST part. i.e., if your drive is corrupted in the boot sectors in the pre-operating-system portion of the process, it has no conception of a second (or third, or whatever) device, and so on and forth, so you'd be screwed there. So there's nothing you can do. But that's always been a risk on any boot device.

However, the second part, once the pair (or whatever) of devices mount in the FreeBSD context as a zpool, then you get all the advantages.

So I see:

No increase in the disadvantages/risk, and
Some increase in the advantages/win.

So that's a net win. So I'm a fan.

Do you see anything differently?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
This post makes me glad I went with a SATA DOM for my installation. Not saying they are any more reliable than a USB drive but they were designed for exactly this type of installation.

You shouldn't have to manually run a scrub on the boot drive. According to the manual it will run every 30 days automatically.

Scrub Boot: can be used to perform a manual scrub of the boot device(s). By default, the boot device is scrubbed after every installation or upgrade and every 30 days.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
This post makes me glad I went with a SATA DOM for my installation. Not saying they are any more reliable than a USB drive but they were designed for exactly this type of installation.

You shouldn't have to manually run a scrub on the boot drive. According to the manual it will run every 30 days automatically.
You raise a valid point.

It is long-accepted guidance in the FreeNAS community that scrub intervals should be 2-3 per month for consumer grade hardware devices, and 1 per month for enterprise grade hardware devices. Now, there is reason to believe that "enterprise grade" isn't all it's cracked up to be, but that's neither here nor there for this discussion. But:

The point I'm making is that nothing is basically more crappy and dodgy "consumer grade" stuff than the USB drives a lot of the userbase uses to boot FreeNAS. Accordingly, our guidance would seem to suggest that they should be scrubbed more frequently than 30 days.

I think sir, you just made the case that the boot scrub interval should be shorter than 30 days. Perhaps this should be a user-settable property. I may file a feature request to that end.
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
Just going to chime in here with some anecdotal evidence of the crapiness of Kingston's USB storage devices.

When I was working for a local company doing a business-wide Windows XP to 8.1 upgrade for them we were using Kingston DataTravellers to contain our image for the machines that could simply be reimaged. It seems me and my associate has to re-write one of the image sticks every third day as they would just randomly stop being able to boot.

Also, I had previously also used a Kingston DataTraveller for my FreeNAS boot media back around March/April of this year with the UFS and all that. Every few weeks I would bump into odd issues with server behaviour which I would then run an fsck, it would find something and the behaviour would get fixed.

And yeah, user settable scrub interval for the ZFS boot media would be good (although probably could set up a cronjob to run "zpool scrub <bootname>") I suppose. The boot media scrub tends to only take like a minute or two on my machine, plus I already scrub my pools as it is on a weekly basis.
 

DKarnov

Dabbler
Joined
Nov 25, 2014
Messages
44
It may or may not be fair to pan Kingston across the board, but where they've shown problems (and part flim-flams) are in market sectors of extreme price pressure, like the absolute low end of the SSD market. FreeNAS boot drives can be low capacity and don't need USB3 or fancy features - which can put them right in that chunk of the market where a dollar more or less can be the sale-deciding factor for the casual Newegg or brick & mortar shopper. Because of the way the bottom-end flash market works, I wouldn't recommend touching anything at that end of the market without the fab owner's name on it - which basically limits you to Sandisk and Lexar - and even then I'd go a product line or two above the absolute cheapest.

Once you're out of that ~5-30$ish range (or if the thumbstick is shaped like Hello Kitty) that hopefully becomes less of a concern, but as said before, you don't need to spend that on a FreeNAS stick.

I wonder if it would be a good idea to do a boot drive scrub when the GUI shutdown or reboot buttons are pressed, with a popup pausing the shutdown if problems are found. Obviously you wouldn't want to take the time to do a scrub for shutdowns, say, triggered by a UPS alarm, but when shutting down / rebooting from the GUI users might appreciate a warning that the box might not come back up gracefully.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
I think sir, you just made the case that the boot scrub interval should be shorter than 30 days. Perhaps this should be a user-settable property. I may file a feature request to that end.

I wonder how many people would end up with dead boot devices if this became a user settable property? Not that I think this would be a bad thing but it just makes me wonder just how many bad usb devices that are out there in the wild that would end up dead from frequent scrubs or even good ones for that matter.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If it were up to me, I'd remove the name "Kingston" from this discussion because all I saw back in the day were failures with USB. Not limited to any specific manufacturer. After several rounds of that on several boxes, I got A.P.O. and crammed a cheap LSI 3Gbps RAID controller (I wanna say a BR10i) in the N36L for boot along with a pair of cheap 30GB SSD's and it's been running happy for years. We virtualized the others and that picks up the hypervisor's redundant storage for boot, so protected there too.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I wonder how many people would end up with dead boot devices if this became a user settable property? Not that I think this would be a bad thing but it just makes me wonder just how many bad usb devices that are out there in the wild that would end up dead from frequent scrubs or even good ones for that matter.

Scrubs are basically read-only unless they find trouble.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
So I see:

No increase in the disadvantages/risk, and
Some increase in the advantages/win.

So that's a net win. So I'm a fan.

Do you see anything differently?
I think we are on the same page. I have a couple of changes for the user manual in mind, all clarification and this was one of them.

Also this thread isn't about which is a good or bad USB flash device, it's about dual USB boot devices and the advantages it offers.
 
Last edited:

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Yeah I really dislike how this thread has been hijacked to Kingston. I almost regret mentioning it, because that's all unsophisticated users are going to see now, even though we're talking abotu something else.

Anyway: There is little danger of scrubbing a (healthy) device, as has been pointed out.

An unhealthy device can, of course, be made more unhealthy by a scrub. But at that point you're jacked anyway.
 

sremick

Patron
Joined
Sep 24, 2014
Messages
323
Given all I've read, I wish i could've gone with a SATA DOM for my boot device over USB. Unfortunately, I've already maxed-out all my SATA ports on my MB for my 6 HDDs and I'm not sure I'm ready to drop the money for an M1505 and rewire my NAS just to make room for a SATA DOM.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Given all I've read, I wish i could've gone with a SATA DOM for my boot device over USB. Unfortunately, I've already maxed-out all my SATA ports on my MB for my 6 HDDs and I'm not sure I'm ready to drop the money for an M1505 and rewire my NAS just to make room for a SATA DOM.

Not worth it. Two redundant high quality USB's are /probably/ fine, and make use of otherwise-wasted ports.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
For the price of the HBA and the DOM, you might as well buy a handful of decent USB drives, mirror them to your heart's content and keep a few spares.

Just don't go overboard...

 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Scrub Boot: can be used to perform a manual scrub of the boot device(s). By default, the boot device is scrubbed after every installation or upgrade and every 30 days.

That certainly isn't happening. I've got evidence to prove it. My ZFS had to be manually scrubbed at the 35 day mark, and this was after multiple upgrades of the OS. Just decided to put a ticket in on it.
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
That certainly isn't happening. I've got evidence to prove it. My ZFS had to be manually scrubbed at the 35 day mark, and this was after multiple upgrades of the OS. Just decided to put a ticket in on it.
Can concur, my most recent scrub was before last update I applied.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
My boot devices used to get scrubbed with every update until the 9.3 release, but not with the six(?) updates since. So it last happened on 9th December. I'll add a note as to whether this happens again on 8/9th January, depending whether they mean thirty days or a calendar month. It seems odd if it varies between different installations.
 
Status
Not open for further replies.
Top