TrueNAS SCALE 23.10.0 has been released!

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Even better, why not a pool checkpoint?
When the feature was introduced you needed to export and re-import the pool to delete it. Is that still the case?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Even better, why not a pool checkpoint?
I thought about that--and it'd also allow you to roll back from enabling feature flags. Not sure off the top of my head about its implications for your data, though.
 
Joined
Oct 22, 2019
Messages
3,641
Not sure off the top of my head about its implications for your data, though.

The main implication is...

If you do not immediately rewind back to the checkpoint, you risk losing "future data" that you wrote to the pool after the checkpoint's creation.


So you're caught in a dilemma. (Not really "dilemma", but you need to make a decision ASAP.)


Soon after doing "the big thing" (such as upgrading the appliance, upgrading the pool, or some massive migratory project), you must immediately make an assessment and ask yourself "Is everything good?" Because once you answer this question, you are left with two choices that must be acted upon ASAP:
  1. Everything is good. All clear. Delete the checkpoint.
  2. Something went wrong! Not good! Rewind to the checkpoint.

If you wait for too long and then later on decide "Hmmm, maybe I should rewind after all...." then it means you will lose everything you saved/wrote/modified since the checkpoint's creation.

The checkpoint will also hold onto used space until it is deleted, making it less feasible with time (and besides, it defeats the purpose to create one in the first place, to just sit on it for "later".)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If you do not immediately rewind back to the checkpoint, you risk losing "future data" that you wrote to the pool after the checkpoint's creation.
Yeah, that's what I was thinking. It's like a snapshot in that regard, but both more powerful (in that it lets you reverse pool-level actions) and more risky (for the same reason).

The checkpoint will also hold onto used space until it is deleted
...which is also true, though to a lesser degree, of the proposed snapshot of ix-applications.
 
Joined
Oct 22, 2019
Messages
3,641
...which is also true, though to a lesser degree, of the proposed snapshot of ix-applications.
What makes the "checkpoint" uniquely different is the "dilemma" (i.e, "make your decision ASAP") due to its pool-wide nature.

Created new datasets since then? Saved files on this or that dataset since then? Created snapshots for this or that dataset since then? If you rewind to the snapshot: all gone

However, for snapshots (even recursive ones), it only affects the datasets in question. Everything else you do in the pool (writing to other datasets, creating brand new datasets, create snapshots for other datasets, creating new jails, etc) is untouched if you rollback a particular snapshot on a particular dataset(s).

That's why I don't like to compare a pool checkpoint to a snapshot. Even though they overlap to some degree, the use of a pool checkpoint has a distinct purpose (as explained above.) You need to make a decision soon after you do a "big thing" to the pool. You should never find yourself "sitting" on a checkpoint, as you would a snapshot. (Individual snapshots can be conveniently "browsed" for deleted files on a live running system. Checkpoints cannot. They can be perused with a read-only emergency checkpoint "import", but that's a drastic action.)
 

wardtj

Cadet
Joined
Oct 31, 2023
Messages
2
I think we already have one bug report of a similar nature : https://ixsystems.atlassian.net/browse/NAS-124776
If you can send me a debug by pm I can get this reviewed.

I'm not sure if this is related, but I had a long-standing bug with TrueNAS Angelfish and Bluefin when it came to the ix-applications dataset.

- If I turned on replication from the system volume to a backup volume,
- And the replication included the "ix-applications" dataset,
- And the backup volume mounted before, the system volume...

The machine would persistently refuse to mount all of my pools. In older versions of TrueNAS, it used to have a 15 minute timeout for mounting ZFS pools. The system would countdown the 15 minutes, and then fail the import, and move along.

At that time I had a prod pool, and a DR pool.

What I noticed was that the ZFS mount scripts for boot mounted the volumes in Alphabetical order.

So, if my prod pool was,

prodpool-1

and the DR pool was,

disaster-recovery-prodpool-1

The system would mount the ZFS pool "disaster-recovery-prodpool-1" and then hang for the 15 minutes refusing to mount the regular prodpool-1, if it contained a replicated copy of the ix-applications dataset.

I tried reinstalling with different versions of Angelfish and Bluefin, loading configs from backups, etc. It was really annoying. I figured it out, when I deleted the DR pool. Once the DR pool was gone, the system booted fine. If I re-created the pool, and ix-applications was included in the replications, bang, hung dead system. I had to do some SQL-fu on the freenas.db as when the system came up, I could not reliably remove the DR pool with the replicated copy of the ix-applications dataset. Something in the boot process wanted to grab onto any of the "ix-applications" mounts, whether it be on the system volume or a replicated backup volume.

No encryption or anything like that.

From that time, I've never trusted the ix-applications dataset or configuration on TrueNAS. I explicitly have my replication ignore ix-applications dataset, and things work ok. I'm wondering if some of the upgrade bugs and missing datasets are similar. The messages in the forum here are very similar to what I would experience. ZFS pools missing, not mounting, etc. I know some of the causes are linked to encryption and unencrypted ix-applications, but I also know that a system with two copies of ix-applications, where the non-system copy is located in a pool that mounts before production pool that has ix-applications in it, will hang up the boot process and cause no end of pain.

Hope this helps,
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Just don't upgrade the boot pool. That will break GRUB.
But why? In FreeBSD you simply upgrade your boot loader. What would be the equivalent procedure for grub?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
But why? In FreeBSD you simply upgrade your boot loader. What would be the equivalent procedure for grub?
From what I've read and understand, the people who maintain Grub don't like ZFS any more than the Linux kernel people.

So their is limited support for ZFS pool features in Grub. Plus, even if Grub did continue to add recent ZFS pool features, either normal Grub release delays would not keep the pool features current. And Debian, (which SCALE is based on), would also introduce it's own delay in Grub ZFS pool features.

If I remember correctly, TrueNAS SCALE uses it's own OpenZFS release, not Debian's release of ZFS. Thus, OpenZFS is probably more current on SCALE than Debian.


Now some of us get around the Grub problem by using a "/boot" partition. This allows the root pool to use any or all OpenZFS features as long as the Linux kernel & it's associated RAM disk have that support. Grub does not know or care about the OS partitions.
 
Joined
Oct 22, 2019
Messages
3,641
Now some of us get around the Grub problem by using a "/boot" partition. This allows the root pool to use any or all OpenZFS features as long as the Linux kernel & it's associated RAM disk have that support. Grub does not know or care about the OS partitions.
Why doesn't SCALE do that then? It's not like TrueNAS is tentative about partitioning the boot drive.

Part 1: ESP (FAT32)
Part 2: /boot (Ext4 or XFS)
Part 3: OS (ZFS)

If you choose to "mirror" your boot device during installation, the OS will use a mirror vdev (ZFS) for Part 3 (no different than how Core does it), and it can use mdadm for /boot (Linux software RAID).

That would work, yes?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Why doesn't SCALE do that then? It's not like TrueNAS is tentative about partitioning the boot drive.

Part 1: ESP (FAT32)
Part 2: /boot (Ext4 or XFS)
Part 3: OS (ZFS)

If you choose to "mirror" your boot device during installation, the OS will use a mirror vdev (ZFS) for Part 3 (no different than how Core does it), and it can use mdadm for /boot (Linux software RAID).

That would work, yes?
Yes, that would work.

Even using a limited ZFS pool feature set on "/boot" would work too. No need to use EXT4 or XFS. But, using MD-RAID, (like with swap), & a Linux file system does prevent users from updating their "/boot" pool features because that option does not exist.

However, using yet more MD-RAID, this time with actual critical code, (though just boot time code), means that a scrub and or validity check would be helpful. I implemented such with my own Linux swap & "/boot" using the more primitive MD-RAID scan;

echo check >>/sys/block/${MD-RAID}/md/sync_action

This at least detects bad blocks, (that have not yet been read). Though it does nothing as complete as a checksum of files, (because MD-RAID has no clue about files... it is just a block device RAID manager).


I think the real heart of the problem is that Linux people tend to work more in command line. And treat their SCALE as a server with NAS functions, than an appliance that must be managed through the GUI & TUI, (aka command line CLI).

I would guess that their is not an option in the GUI to update the ZFS pool features for the boot-pool. At least I have not found one. But, their is an annoying warning on the boot pool's status page about ZFS pool features not being enabled.
 

ropeguru

Dabbler
Joined
Jan 25, 2022
Messages
29
Anyone noticed on the Dashboard that the in and out metrics for Interface are incorrect? Supposedly my 10Gb interface can do 17 GigaBytes (GB) of transfer..

 

wardtj

Cadet
Joined
Oct 31, 2023
Messages
2
Anyone noticed on the Dashboard that the in and out metrics for Interface are incorrect? Supposedly my 10Gb interface can do 17 GigaBytes (GB) of transfer..

Yes, it seems off by a factor of 10. The kb are shown as mb, etc.
 

probain

Patron
Joined
Feb 25, 2023
Messages
211
.
.
.
I think the real heart of the problem is that Linux people tend to work more in command line. And treat their SCALE as a server with NAS functions, than an appliance that must be managed through the GUI & TUI, (aka command line CLI).
.
.
To be fair. There is very much that is "Generally recommended" that you just can't do in the GUI. And therefore have to do in the Shell.
E.g. Burn in of new drives, miscellaneous zfs & zpool commands (zpool list -v, for instance), just to name a few.
Obviously they can't have everything in the GUI. But I would like to see more of these "best practises" other common commands implemented.

I even have a suggestion Jira for adding a Burn in option, here
 
Joined
Oct 22, 2019
Messages
3,641
To be fair. There is very much that is "Generally recommended" that you just can't do in the GUI.
I'm going to print this out on cardstock paper and frame it on my wall.

It's even worse for Core. At least SCALE has filled in some of these voids in its appliance's GUI.
 
Top