Bluefin Upgrade Discussion

insan3

Dabbler
Joined
Apr 3, 2017
Messages
11
I have updated to bluefin but i was a little frighting experience. All my apps wouldn't start. In kubectl i noticed the following:

Code:
root@freenas[~]# k3s kubectl get pods -A
NAMESPACE             NAME                                           READY   STATUS             RESTARTS       AGE
prometheus-operator   prometheus-operator-5c7445d877-rbmtg           0/1     Completed          0              7d21h
cnpg-system           cnpg-controller-manager-854876b995-kc6wz       0/1     Completed          0              7d21h
kube-system           openebs-zfs-controller-0                       0/5     Error              0              7d21h
metallb-system        controller-7597dd4f7b-58ggf                    0/1     Completed          0              7d21h
kube-system           coredns-d76bd69b-8t5fx                         0/1     Completed          0              7d21h
kube-system           nvidia-device-plugin-daemonset-xsnrr           0/1     Completed          0              7d21h
metallb-system        speaker-kb7qv                                  0/1     CrashLoopBackOff   16 (85s ago)   7d21h
kube-system           openebs-zfs-node-wxn5n                         1/2     CrashLoopBackOff   17 (18s ago)   7d21h


( i removed all my own pods in this view for easier reading.

Also in the logs it was misery all along

Code:
<snip>
ec 30 14:26:23 freenas k3s[14741]: E1230 14:26:23.515357   14741 kuberuntime_manager.go:954] "Failed to stop sandbox" podSandboxID={Type:docker ID:30a0d090744cd891b64b6f5bf5d0a2d44930db1eb5ad2b277fba77b47ebe3dd3}
Dec 30 14:26:23 freenas k3s[14741]: E1230 14:26:23.515419   14741 kubelet.go:1806] failed to "KillPodSandbox" for "85a93451-1ca3-4892-b7d0-d9d82f461ade" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"svclb-unifi-comm-db7b87da-h2494_kube-system\" network: cni c>
Dec 30 14:26:23 freenas k3s[14741]: E1230 14:26:23.515462   14741 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"85a93451-1ca3-4892-b7d0-d9d82f461ade\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"svclb-unifi-com>
Dec 30 14:26:24 freenas k3s[14741]: {"level":"warn","ts":"2022-12-30T14:26:24.379+0100","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0019ec700/kine.sock","attempt":0,"error":"rpc error: code = Unknown desc = no such t>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.445465   14741 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.515120   14741 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized" pod="kube-system/>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518083   14741 remote_runtime.go:269] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"svclb-unifi-stun-e71d9e5a-n477w_kube-system\" network: cni config uninitialized" podSandboxID="aa2af2>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518149   14741 kuberuntime_manager.go:954] "Failed to stop sandbox" podSandboxID={Type:docker ID:aa2af2633d453ab0aed5a74c60194cadd31cc1f4940c93f8b0b9cb58deb4076a}
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518227   14741 kubelet.go:1806] failed to "KillPodSandbox" for "c802fc6b-c324-4a3f-a0db-f44c59a16288" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"svclb-unifi-stun-e71d9e5a-n477w_kube-system\" network: cni c>
Dec 30 14:26:24 freenas k3s[14741]: E1230 14:26:24.518283   14741 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"c802fc6b-c324-4a3f-a0db-f44c59a16288\" with KillPodSandboxError: \"rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \\\"svclb-unifi-stu>
</snip>


I managed to get it working by unsetting the app pool, reboot, and add the pool again. After that, it worked fine.

Another thing i notice and haven't figured out yet is why the zfs pool upgrade notification keeps on, i did upgrade the pool in storage but it keeps on notifying me.

New ZFS version or feature flags are available for pool 'tank'. Upgrading pools is a one-time process that can prevent rolling the system back to an earlier TrueNAS version. It is recommended to read the TrueNAS release notes and confirm you need the new ZFS feature flags before upgrading a pool.​


Anyone else is having this?
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
After upgrading some of my datasets will not unlock.

1673196350060-png.62157
 

browntiger

Explorer
Joined
Oct 18, 2022
Messages
58
Odd that "vbasftp" does NOT say locked by ancestor. I would have unlocked "vbasftp" before locking the "backups".
Sounds like unlock of backups failed or you unchecked the unlock child datasets.
Create a ticket / copy all data out and redo them...
 

ctag

Patron
Joined
Jun 16, 2017
Messages
225
Thanks browntiger.

When creating a new dataset named "vbas" with default settings (inherited encryption from "backups" dataset) to move the files over to, the UI returns an error.

1673283338171.png


Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 1038, in mount
dataset.mount()
File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 1038, in mount
dataset.mount()
File "libzfs.pyx", line 3969, in libzfs.ZFSDataset.mount
libzfs.ZFSException: cannot mount '/mnt/storage/backups/vbas': failed to create mountpoint: Operation not permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
return methodobj(*params)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1288, in nf
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 1041, in mount
raise CallError(f'Failed to mount dataset: {e}')
middlewared.service_exception.CallError: [EFAULT] Failed to mount dataset: cannot mount '/mnt/storage/backups/vbas': failed to create mountpoint: Operation not permitted
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 181, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/service.py", line 922, in create
rv = await self.middleware._call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1152, in nf
res = await f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1284, in nf
return await func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 3361, in do_create
await self.middleware.call('zfs.dataset.mount', data['name'])
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1306, in call
return await self._call(
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1263, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1269, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1184, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
middlewared.service_exception.CallError: [EFAULT] Failed to mount dataset: cannot mount '/mnt/storage/backups/vbas': failed to create mountpoint: Operation not permitted

Refreshing the page shows the new "vbas" dataset, but it is "locked by ancestor".


Edit: created a bug report: https://ixsystems.atlassian.net/browse/NAS-119836
 
Last edited:

Bann

Dabbler
Joined
Jan 14, 2023
Messages
12
After upgrading, the NEW Storage shows like this:
1673758269816.png

I tried manage device,
1673758383147.png

It seems didn't show right data vdevs. I tried to offline it, then this came out
1673758663495.png

Code:

Code:
Error: concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in <lambda>    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "libzfs.pyx", line 2291, in libzfs.ZFSVdev.offline libzfs.ZFSException: cannot offline /dev/sdc1: no valid replicas During handling of the above exception, another exception occurred: Traceback (most recent call last):  File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker    r = call_item.fn(*call_item.args, **call_item.kwargs)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker    res = MIDDLEWARE._run(*call_args)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run    return self._call(name, serviceobj, methodobj, args, job=job)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1288, in nf    return func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in offline    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 265, in __zfs_vdev_operation    raise CallError(str(e), e.code) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas """ The above exception was the direct cause of the following exception: Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 181, in call_method    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call    return await methodobj(*prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1284, in nf    return await func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1152, in nf    res = await f(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 1151, in offline    await self.middleware.call('zfs.pool.offline', pool['name'], found[1]['guid'])  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1306, in call    return await self._call(  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1263, in _call    return await self._call_worker(name, *prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1269, in _call_worker    return await self.run_in_proc(main_worker, name, args, job)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1184, in run_in_proc    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas


I tried shell, seems good, and other functions are good too.

Code:
root@truenas-scale[~]# zpool status -v NAS_1   
  pool: NAS_1
 state: ONLINE
  scan: scrub repaired 0B in 00:14:54 with 0 errors on Sun Jan 15 00:14:55 2023
config:


    NAME                                    STATE     READ WRITE CKSUM
    NAS_1                                   ONLINE       0     0     0
      sdc1                                  ONLINE       0     0     0
    cache
      847d7bff-d2b0-4950-ba6d-577579c74a6f  ONLINE       0     0     0




I resignned the cache disk, it shows good now.
Want to know if there is way to solve this, thanks!
 

dAlexis

Dabbler
Joined
Aug 15, 2015
Messages
41
Fingers crossed, started upgrading from TrueNAS-SCALE-22.02.4
Found this bug after some research of regular disk activity in Angelfish
However, installed vibration prevention HDD screws before this :)
BTW - advice 2 all, no mind about future Angelfish updates with the bug fixing, but if you don't want 2 see graph like this and may backup the data in case - migrate.
vivaldi_gvmXokGHTV.png

My SCALE beast is just 4 testing now, and I may easily crash all data on it, however, Imsallling ssd-s with such bug is hmm, not too wise indeed.
 

R1CH

Cadet
Joined
Jun 4, 2021
Messages
2
I can't seem to use VM display after upgrading, it just sits at "Loading" forever.

The browser console after logging in shows a JS error "[EFAULT] Upgrade can only be run from the Active Controller."

I only have a single node so this is the active controller. It seems like something in the upgrade didn't finish properly? Is there a way to run this manually?

Code:
{filename: '/usr/lib/python3/dist-packages/middlewared/main.py', lineno: 223, method: 'call_method', line: '            self.send_error(message, e.errno, str(e), sys.exc_info(), extra=e.extra)\n', argspec: Array(4), …}
{filename: '/usr/lib/python3/dist-packages/middlewared/main.py', lineno: 1346, method: '_call', line: '        return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)\n', argspec: Array(5), …}
{filename: '/usr/lib/python3/dist-packages/middlewared/main.py', lineno: 1249, method: 'run_in_executor', line: '        return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))\n', argspec: Array(3), …}
{filename: '/usr/lib/python3.9/concurrent/futures/thread.py', lineno: 58, method: 'run', line: '            self.future.set_result(result)\n', argspec: Array(1), …}
{filename: '/usr/lib/python3/dist-packages/middlewared/schema.py', lineno: 1322, method: 'nf', line: '                return func(*args, **kwargs)\n', varargspec: 'args', …}
{filename: '/usr/lib/python3/dist-packages/middlewared/schema.py', lineno: 1192, method: 'nf', line: '                res = f(*args, **kwargs)\n', varargspec: 'args', …}
{filename: '/usr/lib/python3/dist-packages/middlewared/plugins/failover.py', lineno: 877, method: 'upgrade_pending', line: "            raise CallError('Upgrade can only be run from the Active Controller.')\n", argspec: Array(1), …}


Inspecting the websocket messages I see the display request was actually successful - going to the resulting URI manually shows a working NoVNC instance.
 
Joined
Aug 17, 2022
Messages
1
After upgrading, the NEW Storage shows like this:
View attachment 62467
I tried manage device,
View attachment 62468
It seems didn't show right data vdevs. I tried to offline it, then this came out
View attachment 62469
Code:

Code:
Error: concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "libzfs.pyx", line 465, in libzfs.ZFS.__exit__  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 263, in __zfs_vdev_operation    op(target, *args)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in <lambda>    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "libzfs.pyx", line 2291, in libzfs.ZFSVdev.offline libzfs.ZFSException: cannot offline /dev/sdc1: no valid replicas During handling of the above exception, another exception occurred: Traceback (most recent call last):  File "/usr/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker    r = call_item.fn(*call_item.args, **call_item.kwargs)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 115, in main_worker    res = MIDDLEWARE._run(*call_args)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run    return self._call(name, serviceobj, methodobj, args, job=job)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call    return methodobj(*params)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1288, in nf    return func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 318, in offline    self.__zfs_vdev_operation(name, label, lambda target: target.offline())  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs.py", line 265, in __zfs_vdev_operation    raise CallError(str(e), e.code) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas """ The above exception was the direct cause of the following exception: Traceback (most recent call last):  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 181, in call_method    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call    return await methodobj(*prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1284, in nf    return await func(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1152, in nf    res = await f(*args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 1151, in offline    await self.middleware.call('zfs.pool.offline', pool['name'], found[1]['guid'])  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1306, in call    return await self._call(  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1263, in _call    return await self._call_worker(name, *prepared_call.args)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1269, in _call_worker    return await self.run_in_proc(main_worker, name, args, job)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1184, in run_in_proc    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1169, in run_in_executor    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) middlewared.service_exception.CallError: [EZFS_NOREPLICAS] cannot offline /dev/sdc1: no valid replicas


I tried shell, seems good, and other functions are good too.

Code:
root@truenas-scale[~]# zpool status -v NAS_1  
  pool: NAS_1
 state: ONLINE
  scan: scrub repaired 0B in 00:14:54 with 0 errors on Sun Jan 15 00:14:55 2023
config:


    NAME                                    STATE     READ WRITE CKSUM
    NAS_1                                   ONLINE       0     0     0
      sdc1                                  ONLINE       0     0     0
    cache
      847d7bff-d2b0-4950-ba6d-577579c74a6f  ONLINE       0     0     0




I resignned the cache disk, it shows good now.
Want to know if there is way to solve this, thanks!
I have a similar issue, in that after upgrading to TrueNAS-SCALE-22.12.2, one of my Pools is showing numerous disks unavailable in the gui, but clearly showing online and in a healthy state through zpool status.
1681397535617.png

I'm not quite sure I understand what the remediation is for this.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Could just be a display thing. You have a lot of disks on that system. I'd report it as a bug and guess that large numbers of disks might need some additional tweaking.
 

dwjackson

Cadet
Joined
Jul 9, 2023
Messages
1
just updated to TrueNAS Scale latest update
I cannot log in to web interface
I have changed by password and it was accepted
If I try to login it denies access
I wash, rinse and repeat process over and over
no matter what password I use the results are the same
I know this is usually simple by pressing #4 and changing password
but will not work for me today

thanks for any help

OBTW I can ping address
I can access any file
I can save to NAS
I just can't login into web for status
 

Mark_the_Red

Dabbler
Joined
May 3, 2017
Messages
28
Dumb question here. I upgraded to Bluefin on my server and everything is running smooth and perfect.

Can I destroy the old iocage dataset without harm? My from my understanding of Bluefin, this is no longer needed and all my apps are in the new software->ix-applications dataset. I am never going back to Core even if I could.

I am a little gunshy on doing this, because I have caused a lot of problems in past doing things like this....
 

Attachments

  • Capture.PNG
    Capture.PNG
    31.4 KB · Views: 94
Top