SOLVED Core to Scale Upgrade: NFS & iSCSI Fail

NugentS · Jun 1, 2022

I like living dangerously.
All VM's moved to local storage on the hosts
Backups (both) completed sucessfully
So I pulled the trigger

The upgrade seemed to go well - so its time to test.

Pools & Datasets present - tick
SMB Shares present & correct - tick
Encrypted Dataset, unencrypted - tick
iSCSI Shares stopped
NFS Shares stopped
Scrub Tasks - Correct
Snapshot Tasks - Correct
Replication Tasks - Correct
Cloud Sync Tasks - Correct & Run
Rsync Tasks - there aren't any - Correct
S.M.A.R.T. Tests - Correct
Network - Both interfaces present & correct
Global Config - Correct
Static Routes - Correct
Active Directory - Healthy
Virtualisation - 2 test guests present but won't run due to incorrect NIC. Assigned a NIC and to my suprise they boot
Not seeing anything else wrong - Even UPS says its running - although I am expecting issues here - there are error messages in syslog
Reporting - CPU Temps look interesting and bear no resemblence to the dashboard widgit (which seems correct of the two as it matches what the Core system was saying.) Scale temp reporting is saying Min -20, Mean 9.46, Max 12.13 - which is rubbish. Not important

But NFS & iSCSI not starting
iSCSI - Nothing overtly wrong - just won't start. Fills the console with a load of what don't look like error messages - just fails to start after a few seconds
NFS - Fails to start immediately and again I don't see anything wrong.
I can rebuild the iSCSI setup from scratch (which might fix the issue) - as there is nothing in any of the zvols
But the NFS is a bit worrying as its kinda important - but again its just shares - so could be rebuilt. However I think its a bit more serious. I have a tset scale box and the only difference between the two on the NFS service config is that on the side graded NAS NFSv3 ownership model for NFSv4 is grayed out

Anyone know where I should look for clues?

NugentS · Jun 1, 2022

Flipping back to Core brings everything back again. Including NFS & iSCSI

NugentS · Jun 1, 2022

Update: Installing a shiny new version of Scale has exactly the same problem. Neither NFS or iSCSI will start. I have found cases where spaces in the name of some dataset(s) cause issues. However I don't believe (I am now restoring Core) that this is the case here - I will check as soon as Core reboots. [I don't like spaces in names generally]
Yup - No spaces in names

anodos · Jun 1, 2022

There are some release notes in SCALE regarding a few edge cases in migration from Core to SCALE. Specifically related to NFS. User's NFS configuration is validated more strictly in SCALE. If there are directives related to maproot / mapall users and groups, and if they do not exist on the server, NFS will refuse to start.

NugentS · Jun 1, 2022

Which is fair enough - but on a completely vanilla build? I formatted the boot disks and did not import a configuration although I did import the pools, but didn't do anything with them including setting up any shares

anodos · Jun 1, 2022

NugentS said:
Which is fair enough - but on a completely vanilla build? I formatted the boot disks and did not import a configuration!

Did you have any exports created? kNFSD doesn't start if exports file doesn't exist. Otherwise, if you encounter issues on vanilla install of a release TrueNAS version, please file a bug ticket in jira and attach a debug file.

NugentS · Jun 1, 2022

I'll have to build it again - I have currently returned to a working Core.
Might try again tomorrow.

What about the iSCSI - that didn't / wouldn't start either although I didn't try stripping everything out

anodos · Jun 1, 2022

NugentS said:
I'll have to build it again - I have currently returned to a working Core.
Might try again tomorrow.

What about the iSCSI - that didn't / wouldn't start either although I didn't try stripping everything out

Can't say without logs / debug. That's why I recommend filing a jira ticket. There have been many times particular users expose edge-cases we haven't seen. If we don't get a ticket / debug, we don't have an opportunity to fix it :)

NugentS · Jun 1, 2022

Given that both results were the same upgrade vs almost vanilla build I can go with another upgrade and just flip between them.

NugentS · Jun 1, 2022

I went ahead and side graded to scale again. Its easy enough to rebuild TrueNAS as long as the file data is available on the disks.
Same problem.

https://jira.ixsystems.com/browse/NAS-116518

Looks like NFS is failing because wheel doesn't exist in Scale - which seems logical (and possibly a big miss)

anodos · Jun 1, 2022

Looks like NFS is failing because wheel doesn't exist in Scale - which seems logical (and possibly a big miss)

BTW, that's in the release notes, which is why I asked whether you looked at them :)

/scale/scale22.12/

NugentS · Jun 1, 2022

Guilty as charged.
But to be fair thats an awful lot of release notes
To be honest I didn't really look at NFS - I can live without that for quite some time. iSCSI is rather more important (to me)

Would that still be applicable on a clean build and import pools?

morganL · Jun 1, 2022

NugentS said:
Guilty as charged.
But to be fair thats an awful lot of release notes
To be honest I didn't really look at NFS - I can live without that for quite some time. iSCSI is rather more important (to me)

Would that still be applicable on a clean build and import pools?

To be fair, changing the whole OS and both the NFS and iSCSI protocol stack without any impact is not trivial....

NugentS · Jun 2, 2022

NugentS · Jun 2, 2022

Does anyone have any suggestions as to how I might continue.

The only one I can think of (atm) is to wipe the machine, set up a vanilla Scale build, enable NFS & iSCSI and then copy the data back from backups which is a ballache, but doable.

anodos · Jun 2, 2022

NugentS said:
Does anyone have any suggestions as to how I might continue.

The only one I can think of (atm) is to wipe the machine, set up a vanilla Scale build, enable NFS & iSCSI and then copy the data back from backups which is a ballache, but doable.

NFS side just replace "wheel" with "root" in maproot / mapall in your exports config in our GUI. iSCSI side I didn't look at because I don't own the ticket and I'm more directly concerned with NFS :)

anodos · Jun 2, 2022

I opened a ticket to add "wheel" group to SCALE.

NugentS · Jun 2, 2022

@anodos can I have the ticket number please?

NugentS · Jun 2, 2022

Update:
The NFS issue is different from the iSCSI issue.
NFS would seem to be a simple "wheel" does not exist. This (sort of) holds true to "Its linux, its permissions, its (almost) always permissions". Adding root instead of wheel in the NFS Shares fixes the issue. Nice and simple

iSCSI would appear to be a simple issue with naming. I am using names like "iscsi.ssd.newnas". These names contain '.' which are not compatible with Scale (in some manner). Caleb St John has issued a pull (not that I really know what that means) with the important line. He has also sent me a fix file that now means my iSCSI service starts and my ESXi hosts see the datastores

"extent['name'] = extent['name'].replace('.', '_') # CORE ctl device names are incompatible with SCALE SCST"

There may be other stuff as well. I don't know enough to tell.

Thanks to @anodos and Caleb ST John

Stux · Mar 13, 2024

anodos said:
There are some release notes in SCALE regarding a few edge cases in migration from Core to SCALE. Specifically related to NFS. User's NFS configuration is validated more strictly in SCALE. If there are directives related to maproot / mapall users and groups, and if they do not exist on the server, NFS will refuse to start.

I just got bitten by this when I upgraded from TrueNAS CORE 13.0-U6.1 to SCALE 23.10.2.

...8 hours later...

Yes, I read as many release notes as I could and didn't see anything about this. It would be useful if a note about checking NFS Shares was added to the CORE to SCALE migration documentation.

Essentially, if a group or user doesn't exist, and is mentioned in the mapall attributes, then the nfsd will not start, and the exports won't get created, and `showmount -e ip` won't show any mounts.

When you disable a share, that is when TrueNAS Scale will tell you about the invalid parameters... but no error is emitted when starting/stopping the NFS service. Maybe it should be.

It took me a day of tracing and trying every permissions related setting, nfs protcol etc before actually tracking down the issue.

So, in the interests of helping others with their google-fu, the error seen from the Ubuntu VM when trying to mount the share was

mount.nfs: access denied by server while mounting 192.168.###.###:/mnt/tank/path/to/dataset

This error actually means the mount can't be found. And you could try mounting any random path and see the same error.

If you try showmount -e <truenas ip>, that will list the mounts exported from your truenas... and if you don't see any, even though it looks like you have a bunch of NFS shares setup, then you probably have this issue.

Disable all your mounts (by clicking the checkboxes), and fix any that the UI complains about... then turn them back on again.

Tada.

In my case, it wasn't the wheel group, instead it was the "www" user/group, that once upon a time was used by a jail, and then a vm appache instance via NFS. And was deleted as part of the SCALE migration. Thus leaving a dangling mapall, on an unrelated share.

Important Announcement for the TrueNAS Community.

SOLVED Core to Scale Upgrade: NFS & iSCSI Fail

MVP

MVP

MVP

Sambassador

MVP

Sambassador

MVP

Sambassador

MVP

MVP

Sambassador

MVP

Captain Morgan

MVP

MVP

Sambassador

Sambassador

MVP

MVP

MVP

Similar threads