mew
Cadet
- Joined
- Jun 20, 2022
- Messages
- 8
I am currently preparing to upgrade my TrueNAS Scale server from a two 14tb pool (mirrored) to a ten 14tb pool. My friend loaned me two of his 14tb drives to use as a temporary buffer so I can swap out my drives and resilver them for my new pool. I swapped out one and started the resilvering process with no issues thinking that I'd speed up the process to add a pool using my own drives. Upon starting that I tried to see if I could start up my plex server while this is going on only to find out that it doesn't seem to be working at all. I am using TrueCharts apps and asked them in there if this is an issue that happens when drives resilver hoping that this is a known issue. On further inspection it seems to be that the storage backend is (allegedly) not working at all. I restarted my system a little bit into the resilvering process to see if that'd fix the kubernetes issue but my issues still persisted.
This is the output from kube-system
The resilvering process is also taking forever and I'm assuming it's due to the fact that I've restarted the machine several times and caused more issues than it has helped.
I also seemingly managed to make over 9000 snapshots in the past day and a half alone because kubernetes keeps trying to create pods?
I've made a ticket on Jira but I'm waiting for a response and I'm really not sure what I should do. Should I leave my server running while it is spamming non stop snapshots and bogging down the system? Is there any way to delete these en masse?
My current specs are
Sorry for posting this on the forum as well as on the jira, it's stressing me out as I've not seen anyone else run into similar issues?
Please let me know if you have any questions about my issue!
Thanks for the help :)
This is the output from kube-system
Code:
root@server[~]# k3s kubectl describe pods -n kube-system
Name: openebs-zfs-node-g5mw6
Namespace: kube-system
Priority: 900001000
Priority Class Name: openebs-zfs-csi-node-critical
Node: ix-truenas/192.168.1.49
Start Time: Tue, 21 Jun 2022 17:51:09 -0700
Labels: app=openebs-zfs-node
controller-revision-hash=57f5455f6b
openebs.io/component-name=openebs-zfs-node
openebs.io/version=ci
pod-template-generation=1
role=openebs-zfs
Annotations: <none>
Status: Pending
IP: 192.168.1.49
IPs:
IP: 192.168.1.49
Controlled By: DaemonSet/openebs-zfs-node
Containers:
csi-node-driver-registrar:
Container ID:
Image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
Image ID:
Port: <none>
Host Port: <none>
Args:
--v=5
--csi-address=$(ADDRESS)
--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
ADDRESS: /plugin/csi.sock
DRIVER_REG_SOCK_PATH: /var/lib/kubelet/plugins/zfs-localpv/csi.sock
KUBE_NODE_NAME: (v1:spec.nodeName)
NODE_DRIVER: openebs-zfs
Mounts:
/plugin from plugin-dir (rw)
/registration from registration-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qvnfn (ro)
openebs-zfs-plugin:
Container ID:
Image: openebs/zfs-driver:2.0.0
Image ID:
Port: <none>
Host Port: <none>
Args:
--nodename=$(OPENEBS_NODE_NAME)
--endpoint=$(OPENEBS_CSI_ENDPOINT)
--plugin=$(OPENEBS_NODE_DRIVER)
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
OPENEBS_NODE_NAME: (v1:spec.nodeName)
OPENEBS_CSI_ENDPOINT: unix:///plugin/csi.sock
OPENEBS_NODE_DRIVER: agent
OPENEBS_NAMESPACE: openebs
ALLOWED_TOPOLOGIES: All
Mounts:
/dev from device-dir (rw)
/home/keys from encr-keys (rw)
/host from host-root (ro)
/plugin from plugin-dir (rw)
/sbin/zfs from chroot-zfs (rw,path="zfs")
/var/lib/kubelet/ from pods-mount-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qvnfn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
device-dir:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType: Directory
encr-keys:
Type: HostPath (bare host directory volume)
Path: /home/keys
HostPathType: DirectoryOrCreate
chroot-zfs:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: openebs-zfspv-bin
Optional: false
host-root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType: Directory
registration-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins_registry/
HostPathType: DirectoryOrCreate
plugin-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins/zfs-localpv/
HostPathType: DirectoryOrCreate
pods-mount-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/
HostPathType: Directory
kube-api-access-qvnfn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 52m default-scheduler Successfully assigned kube-system/openebs-zfs-node-g5mw6 to ix-truenas
Normal SandboxChanged 10m (x12 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 1s (x25 over 50m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "openebs-zfs-node-g5mw6": operation timeout: context deadline exceeded
Name: coredns-d76bd69b-6h7nj
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: ix-truenas/192.168.1.49
Start Time: Tue, 21 Jun 2022 17:51:09 -0700
Labels: k8s-app=kube-dns
pod-template-hash=d76bd69b
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-d76bd69b
Containers:
coredns:
Container ID:
Image: rancher/mirrored-coredns-coredns:1.9.1
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/etc/coredns/custom from custom-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lxmn7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
custom-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns-custom
Optional: true
kube-api-access-lxmn7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 57m default-scheduler 0/1 nodes are available: 1 node(s) had taint {ix-svc-start: }, that the pod didn't tolerate.
Warning FailedScheduling 54m (x1 over 55m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {ix-svc-start: }, that the pod didn't tolerate.
Warning FailedScheduling 53m default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.
Normal Scheduled 52m default-scheduler Successfully assigned kube-system/coredns-d76bd69b-6h7nj to ix-truenas
Warning FailedSync 29m (x5 over 30m) kubelet error determining status: rpc error: code = Unknown desc = Error: No such container: 438d145717f95533dc20661f4ca3259e5af73e94521a4ec2a05fbd0ec0c7781a
Normal SandboxChanged 17m (x9 over 34m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 5m19s (x22 over 50m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "coredns-d76bd69b-6h7nj": operation timeout: context deadline exceeded
Name: nvidia-device-plugin-daemonset-n7fwf
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: ix-truenas/192.168.1.49
Start Time: Tue, 21 Jun 2022 17:51:37 -0700
Labels: controller-revision-hash=77f95bfc79
name=nvidia-device-plugin-ds
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/nvidia-device-plugin-daemonset
Containers:
nvidia-device-plugin-ctr:
Container ID:
Image: nvidia/k8s-device-plugin:v0.10.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/kubelet/device-plugins from device-plugin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r2vhn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
device-plugin:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
kube-api-access-r2vhn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 51m default-scheduler Successfully assigned kube-system/nvidia-device-plugin-daemonset-n7fwf to ix-truenas
Normal SandboxChanged 9m51s (x13 over 34m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 3m40s (x23 over 49m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "nvidia-device-plugin-daemonset-n7fwf": operation timeout: context deadline exceeded
Name: openebs-zfs-controller-0
Namespace: kube-system
Priority: 900000000
Priority Class Name: openebs-zfs-csi-controller-critical
Node: ix-truenas/192.168.1.49
Start Time: Tue, 21 Jun 2022 17:51:36 -0700
Labels: app=openebs-zfs-controller
controller-revision-hash=openebs-zfs-controller-698698d48b
openebs.io/component-name=openebs-zfs-controller
openebs.io/version=ci
role=openebs-zfs
statefulset.kubernetes.io/pod-name=openebs-zfs-controller-0
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/openebs-zfs-controller
Containers:
csi-resizer:
Container ID:
Image: k8s.gcr.io/sig-storage/csi-resizer:v1.2.0
Image ID:
Port: <none>
Host Port: <none>
Args:
--v=5
--csi-address=$(ADDRESS)
--leader-election
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
csi-snapshotter:
Container ID:
Image: k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0
Image ID:
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
--leader-election
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
snapshot-controller:
Container ID:
Image: k8s.gcr.io/sig-storage/snapshot-controller:v4.0.0
Image ID:
Port: <none>
Host Port: <none>
Args:
--v=5
--leader-election=true
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
csi-provisioner:
Container ID:
Image: k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
Image ID:
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
--v=5
--feature-gates=Topology=true
--strict-topology
--leader-election
--extra-create-metadata=true
--enable-capacity=true
--default-fstype=ext4
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
NAMESPACE: kube-system (v1:metadata.namespace)
POD_NAME: openebs-zfs-controller-0 (v1:metadata.name)
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
openebs-zfs-plugin:
Container ID:
Image: openebs/zfs-driver:2.0.0
Image ID:
Port: <none>
Host Port: <none>
Args:
--endpoint=$(OPENEBS_CSI_ENDPOINT)
--plugin=$(OPENEBS_CONTROLLER_DRIVER)
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
OPENEBS_CONTROLLER_DRIVER: controller
OPENEBS_CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
OPENEBS_NAMESPACE: openebs
OPENEBS_IO_INSTALLER_TYPE: zfs-operator
OPENEBS_IO_ENABLE_ANALYTICS: true
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-jwxj2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 51m default-scheduler Successfully assigned kube-system/openebs-zfs-controller-0 to ix-truenas
Warning FailedSync 38m (x3 over 38m) kubelet error determining status: rpc error: code = Unknown desc = Error: No such container: 2c97475ed30a8d3dc3c987f44517dad4720751c3a6366dc3869cbc4216141ef5
Warning FailedSync 23m (x3 over 23m) kubelet error determining status: rpc error: code = Unknown desc = Error: No such container: a81e009b4f084f12806bec335c332c6c2b168fd461909906875183083448cb12
Normal SandboxChanged 19m (x10 over 37m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 4m47s (x22 over 49m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "openebs-zfs-controller-0": operation timeout: context deadline exceeded
root@server[~]#
The resilvering process is also taking forever and I'm assuming it's due to the fact that I've restarted the machine several times and caused more issues than it has helped.
I also seemingly managed to make over 9000 snapshots in the past day and a half alone because kubernetes keeps trying to create pods?
I've made a ticket on Jira but I'm waiting for a response and I'm really not sure what I should do. Should I leave my server running while it is spamming non stop snapshots and bogging down the system? Is there any way to delete these en masse?
My current specs are
Code:
Intel i7 5820k Gigabyte X99 UD4 (Revision 1) 102gb of of DDR4 RAM Corsair ax850 Gold Nvidia EVGA 1070 One (formerly two) shucked WD easystore (wd whites) - 14tb One WD Purple 14tb drive
Sorry for posting this on the forum as well as on the jira, it's stressing me out as I've not seen anyone else run into similar issues?
Please let me know if you have any questions about my issue!
Thanks for the help :)