Bluefin Upgrade Discussion

insan3 · Dec 30, 2022

I have updated to bluefin but i was a little frighting experience. All my apps wouldn't start. In kubectl i noticed the following:

Code:

root@freenas[~]# k3s kubectl get pods -A
NAMESPACE             NAME                                           READY   STATUS             RESTARTS       AGE
prometheus-operator   prometheus-operator-5c7445d877-rbmtg           0/1     Completed          0              7d21h
cnpg-system           cnpg-controller-manager-854876b995-kc6wz       0/1     Completed          0              7d21h
kube-system           openebs-zfs-controller-0                       0/5     Error              0              7d21h
metallb-system        controller-7597dd4f7b-58ggf                    0/1     Completed          0              7d21h
kube-system           coredns-d76bd69b-8t5fx                         0/1     Completed          0              7d21h
kube-system           nvidia-device-plugin-daemonset-xsnrr           0/1     Completed          0              7d21h
metallb-system        speaker-kb7qv                                  0/1     CrashLoopBackOff   16 (85s ago)   7d21h
kube-system           openebs-zfs-node-wxn5n                         1/2     CrashLoopBackOff   17 (18s ago)   7d21h

( i removed all my own pods in this view for easier reading.

Also in the logs it was misery all along

Code:

<snip>
ec 30 14:26:23 freenas k3s[14741]: E1230 14:26:23.515357   14741 kuberuntime_manager.go:954] "Failed to stop sandbox" podSandboxID={Type:docker ID:30a0d090744cd891b64b6f5bf5d0a2d44930db1eb5ad2b277fba77b47ebe3dd3}
Dec 30 14:26:23 freenas k3s[14741]: E1230 14:26:23.515419   14741 kubelet.go:1806] failed to "KillPodSandbox" for "85a93451-1ca3-4892-b7d0-d9d82f461ade" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"svclb-unifi-comm-db7b87da-h2494_kube-system\" network: cni c>
Dec 30 14:26:23 freenas k3s[14741]: E1230 14:26:23.515462   14741 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"85a93451-1ca3-4892-b7d0-d9d82f461ade\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"svclb-unifi-com>
Dec 30 14:26:24 freenas k3s[14741]: {"level":"warn","ts":"2022-12-30T14:26:24.379+0100","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0019ec700/kine.sock","attempt":0,"error":"rpc error: code = Unknown desc = no such t>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.445465   14741 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.515120   14741 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized" pod="kube-system/>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518083   14741 remote_runtime.go:269] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"svclb-unifi-stun-e71d9e5a-n477w_kube-system\" network: cni config uninitialized" podSandboxID="aa2af2>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518149   14741 kuberuntime_manager.go:954] "Failed to stop sandbox" podSandboxID={Type:docker ID:aa2af2633d453ab0aed5a74c60194cadd31cc1f4940c93f8b0b9cb58deb4076a}
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518227   14741 kubelet.go:1806] failed to "KillPodSandbox" for "c802fc6b-c324-4a3f-a0db-f44c59a16288" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"svclb-unifi-stun-e71d9e5a-n477w_kube-system\" network: cni c>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518283   14741 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"c802fc6b-c324-4a3f-a0db-f44c59a16288\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"svclb-unifi-stu>
</snip>

I managed to get it working by unsetting the app pool, reboot, and add the pool again. After that, it worked fine.

Another thing i notice and haven't figured out yet is why the zfs pool upgrade notification keeps on, i did upgrade the pool in storage but it keeps on notifying me.

New ZFS version or feature flags are available for pool 'tank'. Upgrading pools is a one-time process that can prevent rolling the system back to an earlier TrueNAS version. It is recommended to read the TrueNAS release notes and confirm you need the new ZFS feature flags before upgrading a pool.

Anyone else is having this?

ctag · Jan 8, 2023

After upgrading some of my datasets will not unlock.

browntiger · Jan 9, 2023

Odd that "vbasftp" does NOT say locked by ancestor. I would have unlocked "vbasftp" before locking the "backups".
Sounds like unlock of backups failed or you unchecked the unlock child datasets.
Create a ticket / copy all data out and redo them...

ctag · Jan 9, 2023

Thanks browntiger.

When creating a new dataset named "vbas" with default settings (inherited encryption from "backups" dataset) to move the files over to, the UI returns an error.

Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 1038, in mount
dataset.mount()
File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 1038, in mount
dataset.mount()
File "libzfs.pyx", line 3969, in libzfs.ZFSDataset.mount
libzfs.ZFSException: cannot mount '/mnt/storage/backups/vbas': failed to create mountpoint: Operation not permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1288, in nf
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 1041, in mount
raise CallError(f'Failed to mount dataset: {e}')
middlewared.service_exception.CallError: [EFAULT] Failed to mount dataset: cannot mount '/mnt/storage/backups/vbas': failed to create mountpoint: Operation not permitted
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 181, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/service.py", line 922, in create
rv = await self.middleware._call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1152, in nf
res = await f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1284, in nf
return await func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 3361, in do_create
await self.middleware.call('zfs.dataset.mount', data['name'])
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1306, in call
return await self._call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1263, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1269, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1184, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
middlewared.service_exception.CallError: [EFAULT] Failed to mount dataset: cannot mount '/mnt/storage/backups/vbas': failed to create mountpoint: Operation not permitted

Refreshing the page shows the new "vbas" dataset, but it is "locked by ancestor".

Edit: created a bug report: https://ixsystems.atlassian.net/browse/NAS-119836

Bann · Jan 14, 2023

After upgrading, the NEW Storage shows like this:

I tried manage device,

It seems didn't show right data vdevs. I tried to offline it, then this came out

Code:

Error: concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in <lambda>    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "libzfs.pyx", line 2291, in libzfs.ZFSVdev.offline libzfs.ZFSException: cannot offline /dev/sdc1: no valid replicas During handling of the above exception, another exception occurred: Traceback (most recent call last):  File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker    r = call_item.fn(*call_item.args, **call_item.kwargs)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker    res = MIDDLEWARE._run(*call_args)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run    return self._call(name, serviceobj, methodobj, args, job=job)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1288, in nf    return func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in offline    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 265, in __zfs_vdev_operation    raise CallError(str(e), e.code) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas """ The above exception was the direct cause of the following exception: Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 181, in call_method    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call    return await methodobj(*prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1284, in nf    return await func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1152, in nf    res = await f(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 1151, in offline    await self.middleware.call('zfs.pool.offline', pool['name'], found[1]['guid'])  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1306, in call    return await self._call(  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1263, in _call    return await self._call_worker(name, *prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1269, in _call_worker    return await self.run_in_proc(main_worker, name, args, job)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1184, in run_in_proc    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas

I tried shell, seems good, and other functions are good too.

Code:

root@truenas-scale[~]# zpool status -v NAS_1   
  pool: NAS_1
 state: ONLINE
  scan: scrub repaired 0B in 00:14:54 with 0 errors on Sun Jan 15 00:14:55 2023
config:


    NAME                                    STATE     READ WRITE CKSUM
    NAS_1                                   ONLINE       0     0     0
      sdc1                                  ONLINE       0     0     0
    cache
      847d7bff-d2b0-4950-ba6d-577579c74a6f  ONLINE       0     0     0

I resignned the cache disk, it shows good now.
Want to know if there is way to solve this, thanks!

dAlexis · Feb 11, 2023

Fingers crossed, started upgrading from TrueNAS-SCALE-22.02.4
Found this bug after some research of regular disk activity in Angelfish

[NAS-119573] - iXsystems TrueNAS Jira

ixsystems.atlassian.net

However, installed vibration prevention HDD screws before this :)
BTW - advice 2 all, no mind about future Angelfish updates with the bug fixing, but if you don't want 2 see graph like this and may backup the data in case - migrate.

My SCALE beast is just 4 testing now, and I may easily crash all data on it, however, Imsallling ssd-s with such bug is hmm, not too wise indeed.

dAlexis · Feb 11, 2023

Hm. struck in "Checking HA status" during update... Will wait until evening, hope it'll be OK. Only Redis container was installed
This one https://ixsystems.atlassian.net/browse/NAS-114442 is not about update, as i see..

R1CH · Mar 7, 2023

I can't seem to use VM display after upgrading, it just sits at "Loading" forever.

The browser console after logging in shows a JS error "[EFAULT] Upgrade can only be run from the Active Controller."

I only have a single node so this is the active controller. It seems like something in the upgrade didn't finish properly? Is there a way to run this manually?

Code:

{filename: '/usr/lib/python3/dist-packages/middlewared/main.py', lineno: 223, method: 'call_method', line: '            self.send_error(message, e.errno, str(e), sys.exc_info(), extra=e.extra)\n', argspec: Array(4), …}
{filename: '/usr/lib/python3/dist-packages/middlewared/main.py', lineno: 1346, method: '_call', line: '        return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)\n', argspec: Array(5), …}
{filename: '/usr/lib/python3/dist-packages/middlewared/main.py', lineno: 1249, method: 'run_in_executor', line: '        return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))\n', argspec: Array(3), …}
{filename: '/usr/lib/python3.9/concurrent/futures/thread.py', lineno: 58, method: 'run', line: '            self.future.set_result(result)\n', argspec: Array(1), …}
{filename: '/usr/lib/python3/dist-packages/middlewared/schema.py', lineno: 1322, method: 'nf', line: '                return func(*args, **kwargs)\n', varargspec: 'args', …}
{filename: '/usr/lib/python3/dist-packages/middlewared/schema.py', lineno: 1192, method: 'nf', line: '                res = f(*args, **kwargs)\n', varargspec: 'args', …}
{filename: '/usr/lib/python3/dist-packages/middlewared/plugins/failover.py', lineno: 877, method: 'upgrade_pending', line: "            raise CallError('Upgrade can only be run from the Active Controller.')\n", argspec: Array(1), …}

Inspecting the websocket messages I see the display request was actually successful - going to the resulting URI manually shows a working NoVNC instance.

Dana44 · Apr 2, 2023

NickF said:
Only issue I noticed so far was the one called out with hostpath mapping, which was deliberate. But this is probably going to be a common complaint seen:
https://rudergeraete-testsieger.de/bluefin-rudergeraet-test/

And VNC Not working, which is supposed to be merge into release:

[NAS-118691] - iXsystems TrueNAS Jira

ixsystems.atlassian.net

THX verry interesting to read

OldTechGuySteve · Apr 13, 2023

Bann said:

After upgrading, the NEW Storage shows like this:
View attachment 62467
I tried manage device,
View attachment 62468
It seems didn't show right data vdevs. I tried to offline it, then this came out
View attachment 62469
Code:

Code:

Error: concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in <lambda>    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "libzfs.pyx", line 2291, in libzfs.ZFSVdev.offline libzfs.ZFSException: cannot offline /dev/sdc1: no valid replicas During handling of the above exception, another exception occurred: Traceback (most recent call last):  File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker    r = call_item.fn(*call_item.args, **call_item.kwargs)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker    res = MIDDLEWARE._run(*call_args)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run    return self._call(name, serviceobj, methodobj, args, job=job)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1288, in nf    return func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in offline    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 265, in __zfs_vdev_operation    raise CallError(str(e), e.code) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas """ The above exception was the direct cause of the following exception: Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 181, in call_method    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call    return await methodobj(*prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1284, in nf    return await func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1152, in nf    res = await f(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 1151, in offline    await self.middleware.call('zfs.pool.offline', pool['name'], found[1]['guid'])  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1306, in call    return await self._call(  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1263, in _call    return await self._call_worker(name, *prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1269, in _call_worker    return await self.run_in_proc(main_worker, name, args, job)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1184, in run_in_proc    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas

I tried shell, seems good, and other functions are good too.

Code:

root@truenas-scale[~]# zpool status -v NAS_1  
  pool: NAS_1
 state: ONLINE
  scan: scrub repaired 0B in 00:14:54 with 0 errors on Sun Jan 15 00:14:55 2023
config:


    NAME                                    STATE     READ WRITE CKSUM
    NAS_1                                   ONLINE       0     0     0
      sdc1                                  ONLINE       0     0     0
    cache
      847d7bff-d2b0-4950-ba6d-577579c74a6f  ONLINE       0     0     0

I resignned the cache disk, it shows good now.
Want to know if there is way to solve this, thanks!

I have a similar issue, in that after upgrading to TrueNAS-SCALE-22.12.2, one of my Pools is showing numerous disks unavailable in the gui, but clearly showing online and in a healthy state through zpool status.

I'm not quite sure I understand what the remediation is for this.

jgreco · Apr 13, 2023

Could just be a display thing. You have a lot of disks on that system. I'd report it as a bug and guess that large numbers of disks might need some additional tweaking.

dwjackson · Jul 9, 2023

just updated to TrueNAS Scale latest update
I cannot log in to web interface
I have changed by password and it was accepted
If I try to login it denies access
I wash, rinse and repeat process over and over
no matter what password I use the results are the same
I know this is usually simple by pressing #4 and changing password
but will not work for me today

thanks for any help

OBTW I can ping address
I can access any file
I can save to NAS
I just can't login into web for status

Mark_the_Red · Aug 4, 2023

Dumb question here. I upgraded to Bluefin on my server and everything is running smooth and perfect.

Can I destroy the old iocage dataset without harm? My from my understanding of Bluefin, this is no longer needed and all my apps are in the new software->ix-applications dataset. I am never going back to Core even if I could.

I am a little gunshy on doing this, because I have caused a lot of problems in past doing things like this....

Important Announcement for the TrueNAS Community.

Bluefin Upgrade Discussion

insan3

Dabbler

New ZFS version or feature flags are available for pool 'tank'. Upgrading pools is a one-time process that can prevent rolling the system back to an earlier TrueNAS version. It is recommended to read the TrueNAS release notes and confirm you need the new ZFS feature flags before upgrading a pool.

ctag

Patron

browntiger

Explorer

ctag

Patron

Bann

Dabbler

dAlexis

Dabbler

[NAS-119573] - iXsystems TrueNAS Jira

dAlexis

Dabbler

R1CH

Cadet

Dana44

Cadet

[NAS-118691] - iXsystems TrueNAS Jira

OldTechGuySteve

Cadet

jgreco

Resident Grinch

dwjackson

Cadet

Mark_the_Red

Dabbler

Attachments

Similar threads

Important Announcement for the TrueNAS Community.

Bluefin Upgrade Discussion

Dabbler

Patron

Explorer

Patron

Dabbler

Dabbler

Dabbler

Cadet

Cadet

Cadet

Resident Grinch

Cadet

Dabbler

Attachments

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Bluefin Upgrade Discussion"

Similar threads