SOLVED Attempt to move System Dataset - Failed Successfully?

Demonitron

Cadet
Joined
Jul 23, 2022
Messages
6
Good evening all, sorry that my first ever post here is going to be in the nature of "Halp, I think I broke something!", but...yeah, I think I did. Kind of.

I was reading around trying to stop my T110 PowerEdge server from accessing the SAS Pool every 5 seconds, regardless of being idle as far as network activity went, and fell across a suggestion of moving the System Dataset from the storage pool to the boot pool.

Found the button for doing exactly that, pressed it & got two errors that came up saying that moving had failed. Afraid I forgot to note down what the errors were but I do remember something in the realms of it telling me it was 'busy'.

With that in mind I thought perhaps a reboot would help un-busy whatever it was.

On reboot, the GUI shows that the System Dataset is indeed now on the boot pool, but the drives continue to be accessed every 5 seconds (+/- a few milliseconds), which I now understand to be 'ZFS flushing'? Bare with me, this is my first ZFS experience so...I'm learning as I go.

Anyway, the same thread that pointed me moving the System Dataset as a possible solution to my problem also gave a command for checking where it was located, with the 5 second access still happening I decided to give it a try & found that now I have my .system folder on BOTH the storage AND the boot pools.

What happened and...how do I fix it?

Other than this...hiccup, I am really, really loving TrueNAS Scale. I'm coming from a OMV 4.x system. I had intended on upgrading to OMV 6.x but it seems...clunky in comparison with it's older cousins. TrueNAS and it's originator FreeNAS have always terrified me since I know absolutely nothing about UNIX. Scale being Linux based has made trying it out a little less daunting (but ZFS is still scary!).
 

Attachments

  • Clipboard01.jpg
    Clipboard01.jpg
    104.6 KB · Views: 266
Joined
Oct 22, 2019
Messages
3,641
I decided to give it a try & found that now I have my .system folder on BOTH the storage AND the boot pools.
I'm glad I'm still on Core if this is indeed a bug on SCALE. You did everything correctly and properly, using only the GUI without any fancy command-line tricks.

If possible, stop all services and apps. Then try to move the System Dataset back to the storage pool. Afterwards, try to move it back to the boot-pool.

My hunch is that the operation copied over everything from the .system dataset on the storage pool to your boot-pool, but did not inform TrueNAS of the change. So essentially, it only copied the data without changing the configuration, even though it shows "boot-pool" as the active System Dataset pool.
 

Demonitron

Cadet
Joined
Jul 23, 2022
Messages
6
I'm glad I'm still on Core if this is indeed a bug on SCALE. You did everything correctly and properly, using only the GUI without any fancy command-line tricks.

If possible, stop all services and apps. Then try to move the System Dataset back to the storage pool. Afterwards, try to move it back to the boot-pool.

My hunch is that the operation copied over everything from the .system dataset on the storage pool to your boot-pool, but did not inform TrueNAS of the change. So essentially, it only copied the data without changing the configuration, even though it shows "boot-pool" as the active System Dataset pool.
Many thanks for your reply! :)

I tried as you suggested, stopping all services (I'm not running any apps yet) & attempted to move the System Dataset back to the storage pool & received this error (guessing it is the same one that I got in the first place):

Code:
Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 411, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 446, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1133, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1265, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/sysdataset.py", line 221, in do_update
    await self.migrate(config['pool'], new['pool'])
  File "/usr/lib/python3/dist-packages/middlewared/plugins/sysdataset.py", line 501, in migrate
    await self.__umount(_from, config['uuid'])
  File "/usr/lib/python3/dist-packages/middlewared/plugins/sysdataset.py", line 465, in __umount
    raise CallError(error) from None
middlewared.service_exception.CallError: [EFAULT] Unable to umount boot-pool/.system/syslog-6185a5ad8d7a46da9a9af93ab68469e9: umount: /var/db/system/syslog-6185a5ad8d7a46da9a9af93ab68469e9: target is busy.

The following processes are using '/var/db/system/syslog-6185a5ad8d7a46da9a9af93ab68469e9': [
  {
    "pid": "6591",
    "name": "nmbd",
    "cmdline": "/usr/sbin/nmbd --foreground --no-process-group",
    "paths": [
      "/var/db/system/syslog-6185a5ad8d7a46da9a9af93ab68469e9/log/samba4/auth_audit.log",
      "/var/db/system/syslog-6185a5ad8d7a46da9a9af93ab68469e9/log/samba4/log.nmbd"
    ]
  }
]
 
Joined
Oct 22, 2019
Messages
3,641
Not familiar with SCALE's GUI layout, but can you look under either Network or SMB to disable any use of NetBIOS and WS Discovery?

This is what it looks like for me on Core:
disable-netbios.jpg
 
Last edited:

Demonitron

Cadet
Joined
Jul 23, 2022
Messages
6
Not familiar with SCALE's GUI layout, but can you look under either Network or SMB to disable any use of NetBIOS and WS Discovery?

That did the trick! Turning off NetBIOS & WS-Discovery then following your original suggestion of moving the Dataset back, then to the boot pool fixed the problem. Dataset now on boot pool, and ONLY on boot pool.

Thank you!

Hasn't fixed the HDD access annoyance mind you...but I guess that's enough 'almost bricking your server full of stuff' for one day. :D
 
Joined
Oct 22, 2019
Messages
3,641
One issue down, one remains! :cool:

You're still seeing that the storage pool is being accessed every 5 seconds, even with no activity? Is this based on LED indicators, or from the Reporting page of TrueNAS?

If you check the Reporting page, does it reveal the constant I/O are writes, reads, or both?
 

Demonitron

Cadet
Joined
Jul 23, 2022
Messages
6
You're still seeing that the storage pool is being accessed every 5 seconds, even with no activity? Is this based on LED indicators, or from the Reporting page of TrueNAS?

If you check the Reporting page, does it reveal the constant I/O are writes, reads, or both?
It's from the audible grunt the HDD's (storage pool) give every 5 seconds. I don't see much in the way of I/O except on the boot SSD which I am guessing is log writes. This grunt was also present in the servers' testing stage with different drives & no personal setup of the SCALE install, except telling apps to use the storage pool & setting up TrueCharts, but no apps were or are installed.

It was suggested that it may have something to do with K3s writing its logs (also quite a few complaints were read about how much CPU power K3s takes while doing a lot of nothing) & killing the service would reduce idle CPU usage to near zero and stop the log writes.

Alas, that didn't seem to help in my case, at least for the grunting. CPU usage did indeed drop from an idle 4% to 0-1%.

As a side note, I have the machine plugged into a power monitor at the moment, just for testing purposes, and when one of those every 5 second grunts happens? The power usage spikes 10-14w.

I've had to turn the machine off for now, it lives in the shed & it's pushing 35c in there this afternoon. I've already had to cable-tie an additional fan on the HDD cage, Dell thinking that a single rear fan was enough to cool the CPU and all four HDD's is bemusing decision.
 

Demonitron

Cadet
Joined
Jul 23, 2022
Messages
6
Just a quick update.

While killing the K3s service didn't solve the 5 second I/O spikes, unsetting the app storage location and rebooting did. So even though I wasn't running any applications it was something to do with that that was causing it.

For now, until I get an SSD pool set up, I've deleted the ix-applications dataset.

I now have a perfectly serene server. ^_^
 
Top