TrueNAS 12.0-U3 is Available

danb35 · Apr 21, 2021

Basil Hendroff said:
workaround at present isn't ideal requiring a server reboot to restore encrypted communication after a certificate is installed.

It could be more readily worked around; service nginx restart should do it without any other interruption to services on the machine. And updating the script to work with -U3 shouldn't be hard--but if I want it to work with both -U3 and prior versions (which I do; I haven't updated my main system to -U3 and now think I'm unlikely to), that's going to take a bit more thought. It would no doubt help if my familiarity level with Python were somewhere above "banging rocks together."

But the bigger issue, I think, is:

It's a breaking change in the API
It happened in what should be a bugfix/maintenance release
It's undocumented (the official TrueNAS documentation hub still says to use the GET method, or at least it did this morning), unless you look at the documentation that's internal to the product--where all it gives you (once you find it--it isn't exactly the easiest thing to navigate) is this:
There was absolutely no warning of the change. I just went line-by-line through the changelog, and it just isn't there. The closest I saw was this one:

But when I follow the link to the Jira issue, I don't see any discussion, comments, or anything else that would tell me that the method for this endpoint is changing. Even if I click through to the PR, I don't see it, though it's certainly possible I'm failing to understand something that would be evident to someone who actually knew Python

Now, the change in the behavior is a good one--under -U2.1 and prior, the response to the system/general/ui_restart call was immediate--the UI restarted immediately, breaking the connection, and causing a connection error. That's actually the condition my script tests for to indicate success--which is more than a little counterintuitive. To accept the request, return a "success" status, delay a few seconds, and then restart the UI (which is what it's intended to do now in -U3), is much better behavior. I don't know why they felt the need to make the delay configurable, or if that's what necessitated the change from GET to POST, but again, the overall change in behavior is positive.

But positive change or not, iX published and documented an API. People integrated with it. And they broke it, deliberately, in a bugfix release, without a word of warning to anyone or a scrap of documentation.

morganL · Apr 21, 2021

danb35 said:
It could be more readily worked around; service nginx restart should do it without any other interruption to services on the machine. And updating the script to work with -U3 shouldn't be hard--but if I want it to work with both -U3 and prior versions (which I do; I haven't updated my main system to -U3 and now think I'm unlikely to), that's going to take a bit more thought. It would no doubt help if my familiarity level with Python were somewhere above "banging rocks together."

Now, the change in the behavior is a good one--under -U2.1 and prior, the response to the system/general/ui_restart call was immediate--the UI restarted immediately, breaking the connection, and causing a connection error. That's actually the condition my script tests for to indicate success--which is more than a little counterintuitive. To accept the request, return a "success" status, delay a few seconds, and then restart the UI (which is what it's intended to do now in -U3), is much better behavior. I don't know why they felt the need to make the delay configurable, or if that's what necessitated the change from GET to POST, but again, the overall change in behavior is positive.

But positive change or not, iX published and documented an API. People integrated with it. And they broke it, deliberately, in a bugfix release, without a word of warning to anyone or a scrap of documentation.

Thanks Dan, I think it's an example of a bugfix causing problems. There was no intention of an impactful change, but there are unintended consequences of even a minor improvement.

Someone in the community justifiably didn't like the previous behavior. It was fixed. However, its clearly possible that a script may have assumed the previous behaviour.. even if it was worse.

The bug is here: https://jira.ixsystems.com/browse/NAS-109435

It was reported, but the consequences were not guessed. I added a note to the bug page. However, anyone experiencing the issue should update their scripts with the new behaviour. It's better that we don't revert back again.

danb35 · Apr 22, 2021

morganL said:
However, its clearly possible that a script may have assumed the previous behaviour.

It "assumed" nothing--it followed the documented method to use this API endpoint.

morganL said:
There was no intention of an impactful change

Then why change the method for this endpoint? There was no need to do that, just as there was no need to accept a parameter for the delay time. When that endpoint previously (through -U2.1) required the GET method, and now it requires POST, how could that not be an impactful change, and how could it not be obvious that it would be such? It pretty well guarantees that anything that used that endpoint is going to stop working. It breaks the bug reporter's script, just as surely as it breaks mine.

There are two distinct issues here:

Until this change, a call to ui_restart immediately restarted the UI, breaking connection. Now, it returns a success code, waits a few seconds, and then does the restart.
Under -U2.1 and prior, this endpoint required GET; now it requires POST

The former, standing alone, would have broken my script--I test for that connection error to indicate success (if the connection failed, then the UI restarted). I have no problem accepting that it wouldn't be apparent to your devs that this is a breaking change; looking for a failed connection to indicate success is kind of weird (even if it was necessary at the time). And despite it breaking my script, I agree (as I said above) that this behavior is better; so far as it goes, I'd consider this a fix to an API/middleware bug, and the problem with my script would result from my assuming the previous buggy behavior.

What I can't understand or accept is that the consequences of the second bullet weren't obvious--again, how could they not be? If the endpoint formerly required one method, and now it no longer accepts that method but requires a different method, how could that not break every single thing out there that uses this endpoint? And the frustration is compounded by the fact that there was just no reason to do it. Even if changing to POST was needed in order to accept a parameter for the length of the delay (and I don't know one way or the other if that's the case), nothing in the ticket asked for that--a hardcoded n-second delay (for some reasonable value of n) would have satisfied the request.

morganL said:
However, anyone experiencing the issue should update their scripts with the new behaviour.

Sure--and deal with yet another instance of "iX can't figure out how they want their software to work." Sure, it's easy enough to change session.get to session.post. It's almost as easy to change it to expect a 200 rather than a connection error. It's a little harder to change it to "do X for version A and later (until it's changed again), but do Y for version B and older." Maybe better would be "try GET, if it returns 405, try POST."

morganL · Apr 22, 2021

@danb35 That's a very reasonable criticism. It was a mistake on our part not to anticipate the potential negative consequences of the improvement.

We'll ask the team to review (and document) any API bug fixes more carefully. @Kris Moore

After an internal meeting, we are exploring making the API respond in both ways in U4. So it would behave similarly to the old GET, but also enable the new POST model. I assume that would be OK with everybody impacted?

danb35 · Apr 22, 2021

morganL said:
I assume that would be OK with everybody impacted?

It'd be fine by me. I've updated the script to try POST, if it returns 405, try GET, and it's working with -U3, -U2.1, and 11.3-Usomething. When it does GET, it still expects the connection error as an indicator of success (which is how it works with -U2.1 and earlier), but since I'm now doing POST first and expecting a 200 status, changing the behavior on GET shouldn't further impact me.

zorak950 · Apr 22, 2021

In happier news, I did the update from U2-U3 a couple days ago and everything's gravy for me. I don't use scripts though.

feanorian · Apr 23, 2021

Constantin said:
To me, there were two major features that drove the upgrade to TrueNAS for me:

special VDEVs / Fusion Pools - leverage multiple enterprise-grade SSDs to support the HDD pool by storing metadata and small files on the sVDEV. Important: figure out ahead of time what your respective storage needs are / likely will be, then buy the right hardware and mirror it n-ways to ensure your pool doesn't go kaplooie if one of those SSDs goes down. Ideally, at least a 3-way mirror

So it's still pretty unclear to me how large these need to be for a given amount of files. There's no rule of thumb or anything. Do you have any idea?

Constantin · Apr 24, 2021

Early on, I had a lot of small files floating around. They ate a lot of metadata space, such that my system needed about 1/40th of usable pool capacity for metadata (2.5%!). Some of the guidance here was for 0.6-1.6%. Pools with very few but large files will need less metadata overhead than pools with a lot of small files. I since cut down on small files by consolidating them into disk images that consist of compressed 8MB "bands". For example, I had uncompressed system backups that contained hundreds of thousands of little files. The system now needs much less space for metadata.

To get an idea for your metadata needs, use the command below and then extrapolate, assuming that pool contents stay similar in the future. In the command line, enter

For FreeNAS (9 through 11): arc_summary.py
For TrueNAS (12.x): arc_summary

Then look for a section that presents itself like this:

Code:

L2 ARC Size: (Adaptive)                         248.42  GiB
        Compressed:                     97.35%  241.84  GiB
        Header Size:                    0.36%   908.05  MiB

So, way back when, my pool needed 248GiB of space for L2ARC. I was using a 1TB SSD, the pool was about 25% filled, so the ratio was just about perfect. These days, the space needed is likely less than 1/2 that despite more data being on the pool thanks to the consolidation of small files.

Now for a short moment on the soap box re: three future features that I hope @morganL and his team can consider:

I wish the UI could tell us how full the sVDEV is, just as it does for the general pool. Once the sVDEV fills up, additional metadata and small files go into the pool and performance potentially craters. There ought to be a dashboard widget that shows the fill of the sVDEV as a pie chart : metadata, small files, and free. Similarly, there ought to be an alert if the sVDEV exceeds 80% fill, just like the pool.
Since the sVDEV is shared by small files and metadata, remember to set your small files threshold sufficiently low to leave enough room for metadata. That's another pet peeve of mine with the current TrueNAS sVDEV implementation, the inability to set quotas for sVDEV contents such that metadata has a reserved minimum. We can set quotas for shares, users, etc. so why not something as important as the contents of the sVDEV also?
Setting small file size limits for the sVDEV is currently somewhat tricky. Ideally, the GUI would also help the admin choose minimum file size limits by giving the user an idea what the impact of setting a file size limit at 1, 2, 4, 8, 16, 32kb, etc. would have on sVDEV fill.

feanorian · Apr 25, 2021

Constantin said:
To get an idea for your metadata needs, use the command below and then extrapolate, assuming that pool contents stay similar in the future. In the command line, enter

/usr/local/www/freenasUI/tools/arc_summary.py

Then look for a section that presents itself like this:

Code:
L2 ARC Size: (Adaptive) 248.42 GiB Compressed: 97.35% 241.84 GiB Header Size: 0.36% 908.05 MiB

Thanks for the rule of thumb and advice here, though whatever this python script is, it either isn't in 12.0-U3 or has been moved (which makes sense). I guess I could boot to 11.3 just to run it though...

Constantin · Apr 26, 2021

Apologies, the two python scripts associated with arc stats are missing in TrueNAS as someone else noted in November.

Ixsystems has replaced them in TrueNAS with the arc_summary command. I'll update the original message accordingly. Also, thanks to the sVDEV storing metadata, the L2ARC here now seems to border on useless.

Code:

L2ARC size (adaptive):                                         937.5 GiB
        Compressed:                                    99.0 %  928.2 GiB
        Header size:                                    0.1 %  720.0 MiB

L2ARC breakdown:                                                   52.9k
        Hit ratio:                                    < 0.1 %         12
        Miss ratio:                                   100.0 %      52.9k
        Feeds:                                                     45.5k

However, I want to take another look at that once I have run a backup or two. According to the stats, the ARC itself is 99.9% effective on a first hit basis, so that seems a bit optimistic...

feanorian · Apr 27, 2021

Constantin said:
Ixsystems has replaced them in TrueNAS with the arc_summary command. I'll update the original message accordingly. Also, thanks to the sVDEV storing metadata, the L2ARC here now seems to border on useless.

Code:
L2ARC size (adaptive):

Thanks again! I didn't look closely enough before though, and didn't realize these were stats for the L2ARC, which I don't have. Do you think maybe I should just count the files or something?

Basil Hendroff · Apr 27, 2021

Having completed the migration from FreeNAS 11.3-U5 to TrueNAS 12.0-U3 across all servers, I thought I'd pen my experience for others considering migrating to TrueNAS. The OS upgrades occurred without incident. I don't use plugins, so the issues I came across centred around jail upgrades from FreeBSD 11.3 to 12.2. I did also experience an SMB issue, but I could have avoided that had I not changed the hostname on my servers. All issues were resolved with varying degrees of effort. Here's a summary:

SMB: Changing hostname retains original SMB mappings
SSMTP: Scripted WordPress Installation (for Reverse Proxy), post #26
deploy_freenas: Let's Encrypt with FreeNAS 11.1 and later, post #158 and Let's Encrypt Local Servers and Devices, post #8
Heimdall: Install Heimdall Dashboard in a jail, post #39

My advice for those contemplating moving to TrueNAS from FreeNAS... If you've got lots of jails, be prepared to take the time to migrate each to use the most recent supported FreeBSD release.

For the record, I've had good results migrating the following jails, but be aware that every use case is likely to be different.

Nextcloud
WordPress (apart from the SSMTP issue identified above)
Transmission
Dnsmasq
Rslsync
Caddy v2
Plex
Tautulli

If you're on an earlier version of FreeNAS and are still using Warden jails, you may want to consider using FreeNAS 11.2-U8 as a stepping stone. FreeNAS 11.2 supports both Warden and iocage jails, whereas 11.3 and later only support iocage jails. 11.2 will give you the opportunity to progressively switch away from Warden jails while minimising any service disruption.

vikonen · May 9, 2021

Thanks for the tips.

As for jails, is there a guide for doing this:

Basil Hendroff said:
migrate each to use the most recent supported FreeBSD release.

?

danb35 · May 9, 2021

iocage upgrade -r 12.2-RELEASE jailname

ornias · May 17, 2021

hescominsoon said:
i ahve held off on U3. IME cosmetic issues usualy mean underlying issues that are not cosmentic...especially the issues about U3 that are beginning to populate the forums.

Having worked on the UI, I can assure anyone this comment is mostly false folklore that stems from a lack of understanding the product good enough.

There are 3 common causes of cosmetic issues, in order of frequency:
1. The GUI has a bug
2. The middleware returns a result the GUI didn't expect
3. The middleware is broken and thus the GUI freaks out (which was what hescominsoon refered to)

This order of frequency is based upon a few hunderd hours of testing UI and middleware changes in SCALE Nightly. I can only come up with a few cases where 3 was the cause of a UI glitch.

Why is this the case?
There are 2 important parts of the GUI:
1. Fetching data using the API
2. How to display that data

A broken middleware piece or API call returning crap, often just causes errors or no data to be displayed, because in step one of the GUI it gets processed and discarded/errored. If the GUI just glitches out it's most often just... the GUI glitching out.

ornias · May 17, 2021

seldo said:
I didn’t get it like that!
I was still wondering if I should wait for U3 or install 11.3 U5
I’ll install the later now and move to 12.0 U3+ or even 12.1 U3+ in the future.

Sorry, but you shouldn't run unsupported FreeBSD versions because there is some niche issue with the TrueNAS API. Which is also not present in U2.
There is literally no reason at all this mistake/issue/feature should cause you to run 11.3.

Constantin · May 17, 2021

feanorian said:
Thanks again! I didn't look closely enough before though, and didn't realize these were stats for the L2ARC, which I don't have. Do you think maybe I should just count the files or something?

There are a couple of rules of thumb out there and since the L2ARC if not necessary / required, experimentation will not harm your system. I started off with 1TB of metadata-only L2ARC for 40TB of capacity but now that I have compressed most small files into Apple Disk Images, I likely could use a smaller one. Any SSD will do, so plug whatever you have lying around into the server and see what the stats start to spit out. But 2.5% of L2ARC vs. pool capacity is likely conservative unless you have a lot of small files.

ornias · May 17, 2021

Constantin said:
Now for a short moment on the soap box re: three future features that I hope @morganL and his team can consider:

I wish the UI could tell us how full the sVDEV is, just as it does for the general pool. Once the sVDEV fills up, additional metadata and small files go into the pool and performance potentially craters. There ought to be a dashboard widget that shows the fill of the sVDEV as a pie chart : metadata, small files, and free. Similarly, there ought to be an alert if the sVDEV exceeds 80% fill, just like the pool.

Since the sVDEV is shared by small files and metadata, remember to set your small files threshold sufficiently low to leave enough room for metadata. That's another pet peeve of mine with the current TrueNAS sVDEV implementation, the inability to set quotas for sVDEV contents such that metadata has a reserved minimum. We can set quotas for shares, users, etc. so why not something as important as the contents of the sVDEV also?

Setting small file size limits for the sVDEV is currently somewhat tricky. Ideally, the GUI would also help the admin choose minimum file size limits by giving the user an idea what the impact of setting a file size limit at 1, 2, 4, 8, 16, 32kb, etc. would have on sVDEV fill.

I think these features are an issue, because getting the total amount of used metadata and such on a ZFS pool (aka the data to fill the pi-chart with) is not an light operation in current ZFS (correct me if i'm wrong though.
So the reason that chart doesn't exist is mostly an upstream issue with OpenZFS and not TrueNAS ;-)

Constantin · May 17, 2021

ornias said:
Sorry, but you shouldn't run unsupported FreeBSD versions because there is some niche issue with the TrueNAS API. Which is also not present in U2. There is literally no reason at all this mistake/issue/feature should cause you to run 11.3.

11.3 fits the needs of a lot of folk. As Patrick and I pointed out, there were significant reasons for us to upgrade and ditto for others. I expect the use case will continue to matter and those who want to stick to 11.3 should feel free to do so. My issues with TrueNAS permissions were hunted down and resolved with the kind help of @anodos

Constantin · May 17, 2021

ornias said:
I think these features are an issue, because getting the total amount of used metadata and such on a ZFS pool (aka the data to fill the pi-chart with) is not an light operation in current ZFS (correct me if i'm wrong though.

I'm obviously not a developer but I'm surprised that the system doesn't keep track of the two disk usages (i.e. the metadata pool and the small files pool) in general. If it cannot be done by sub-type (small files vs. metadata), even a overall fill ratio would be super helpful.

The UI folk can always add a more intensive, but separate sVDEV sub menu to explore these issues in greater detail without clogging up the CPU for the general dashboard. This kind of analysis could be very helpful to size sVDEV pools by allowing admins to see the impact of setting small file thresholds at various file sizes (16k, 32k, etc.) from inside the GUI rather than dropping into the command line.

To me, the sVDEV is pretty revolutionary re: performance, so ensuring adequate free space by reporting use seems like a pretty straightforward UI requirement.

Server	Version	HPE Proliant Micro Server	CPU	RAM (DDR3 ECC @ 1600 MHz)	Pool	Boot	Battery Backup	Jails	VMs	Docker	Other
truenas-l	CORE 12.0-U6	Gen 8	Intel Xeon E3-1270L V2 @ 2.3GHz	16GB	4 x 10TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master	DNSmasq, Heimdall, Nextcloud, Plex (Beta), Resilio Sync, Tautulli, Transmission, WordPress			File & media server. Replication source.
truenas-l2	CORE 12.0-U6	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	4 x 8TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT slave	Caddy Reverse Proxy	Ubuntu 20.0.1 Desktop (2 core, 4GB RAM, 150GB HDD) with Docker and Docker Compose	OnlyOffice, Collabora, TrueCommand, TC 1.2.3 & 1.3.2 Portainer, Nextcloud-Apache, Nextcloud-FPM, WordPress	Plex DVR media server.
truenas-b1	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master				Media replication target.
truenas-b2	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 4TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA Server is NUT slave				File replication target.
truenas-r	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	10GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror		Plex (Beta)			Off-site backup
truenas-t	CORE 12.0-U6	Gen 7 N40L	AMD Turion II Neo N40L @ 1.5GHz	8GB	4 x 3TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror					Test server
truenas-s	SCALE 22.02-RC.1	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	2 x 1TB WD Red in mirror	1 x 32GB Transcend M.2 SSD in a USB 3.1 enclosure				OnlyOffice, Collabora, TrueCommand	Test server

Important Announcement for the TrueNAS Community.

TrueNAS 12.0-U3 is Available

Hall of Famer

Captain Morgan

Hall of Famer

Captain Morgan

Hall of Famer

Dabbler

Cadet

Vampire Pig

Cadet

Vampire Pig

Cadet

Wizard

Dabbler

Hall of Famer

Wizard

Wizard

Vampire Pig

Wizard

Vampire Pig

Vampire Pig

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "TrueNAS 12.0-U3 is Available"

Similar threads