mew
Cadet
- Joined
 - Jun 20, 2022
 
- Messages
 - 8
 
I am currently preparing to upgrade my TrueNAS Scale server from a two 14tb pool (mirrored) to a ten 14tb pool. My friend loaned me two of his 14tb drives to use as a temporary buffer so I can swap out my drives and resilver them for my new pool. I swapped out one and started the resilvering process with no issues thinking that I'd speed up the process to add a pool using my own drives. Upon starting that I tried to see if I could start up my plex server while this is going on only to find out that it doesn't seem to be working at all. I am using TrueCharts apps and asked them in there if this is an issue that happens when drives resilver hoping that this is a known issue. On further inspection it seems to be that the storage backend is (allegedly) not working at all. I restarted my system a little bit into the resilvering process to see if that'd fix the kubernetes issue but my issues still persisted.
This is the output from kube-system
	
	
		
			
		
	
The resilvering process is also taking forever and I'm assuming it's due to the fact that I've restarted the machine several times and caused more issues than it has helped.
I also seemingly managed to make over 9000 snapshots in the past day and a half alone because kubernetes keeps trying to create pods?
	
		
			
		
		
	
		
		
	
	
		
	
	
		
			
		
		
	
I've made a ticket on Jira but I'm waiting for a response and I'm really not sure what I should do. Should I leave my server running while it is spamming non stop snapshots and bogging down the system? Is there any way to delete these en masse?
My current specs are
Sorry for posting this on the forum as well as on the jira, it's stressing me out as I've not seen anyone else run into similar issues?
Please let me know if you have any questions about my issue!
Thanks for the help :)
	
		
			
		
		
	
			
			This is the output from kube-system
Code:
root@server[~]# k3s kubectl describe pods -n kube-system
Name:                 openebs-zfs-node-g5mw6
Namespace:            kube-system
Priority:             900001000
Priority Class Name:  openebs-zfs-csi-node-critical
Node:                 ix-truenas/192.168.1.49
Start Time:           Tue, 21 Jun 2022 17:51:09 -0700
Labels:               app=openebs-zfs-node
                      controller-revision-hash=57f5455f6b
                      openebs.io/component-name=openebs-zfs-node
                      openebs.io/version=ci
                      pod-template-generation=1
                      role=openebs-zfs
Annotations:          <none>
Status:               Pending
IP:                   192.168.1.49
IPs:
  IP:           192.168.1.49
Controlled By:  DaemonSet/openebs-zfs-node
Containers:
  csi-node-driver-registrar:
    Container ID:
    Image:         k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ADDRESS:               /plugin/csi.sock
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/zfs-localpv/csi.sock
      KUBE_NODE_NAME:         (v1:spec.nodeName)
      NODE_DRIVER:           openebs-zfs
    Mounts:
      /plugin from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qvnfn (ro)
  openebs-zfs-plugin:
    Container ID:
    Image:         openebs/zfs-driver:2.0.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --nodename=$(OPENEBS_NODE_NAME)
      --endpoint=$(OPENEBS_CSI_ENDPOINT)
      --plugin=$(OPENEBS_NODE_DRIVER)
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      OPENEBS_NODE_NAME:      (v1:spec.nodeName)
      OPENEBS_CSI_ENDPOINT:  unix:///plugin/csi.sock
      OPENEBS_NODE_DRIVER:   agent
      OPENEBS_NAMESPACE:     openebs
      ALLOWED_TOPOLOGIES:    All
    Mounts:
      /dev from device-dir (rw)
      /home/keys from encr-keys (rw)
      /host from host-root (ro)
      /plugin from plugin-dir (rw)
      /sbin/zfs from chroot-zfs (rw,path="zfs")
      /var/lib/kubelet/ from pods-mount-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qvnfn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  device-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  Directory
  encr-keys:
    Type:          HostPath (bare host directory volume)
    Path:          /home/keys
    HostPathType:  DirectoryOrCreate
  chroot-zfs:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      openebs-zfspv-bin
    Optional:  false
  host-root:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  Directory
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry/
    HostPathType:  DirectoryOrCreate
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/zfs-localpv/
    HostPathType:  DirectoryOrCreate
  pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/
    HostPathType:  Directory
  kube-api-access-qvnfn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               52m                 default-scheduler  Successfully assigned kube-system/openebs-zfs-node-g5mw6 to ix-truenas
  Normal   SandboxChanged          10m (x12 over 32m)  kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  1s (x25 over 50m)   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "openebs-zfs-node-g5mw6": operation timeout: context deadline exceeded
Name:                 coredns-d76bd69b-6h7nj
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 ix-truenas/192.168.1.49
Start Time:           Tue, 21 Jun 2022 17:51:09 -0700
Labels:               k8s-app=kube-dns
                      pod-template-hash=d76bd69b
Annotations:          <none>
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-d76bd69b
Containers:
  coredns:
    Container ID:
    Image:         rancher/mirrored-coredns-coredns:1.9.1
    Image ID:
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /etc/coredns/custom from custom-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lxmn7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  custom-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns-custom
    Optional:  true
  kube-api-access-lxmn7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              beta.kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Warning  FailedScheduling        57m                   default-scheduler  0/1 nodes are available: 1 node(s) had taint {ix-svc-start: }, that the pod didn't tolerate.
  Warning  FailedScheduling        54m (x1 over 55m)     default-scheduler  0/1 nodes are available: 1 node(s) had taint {ix-svc-start: }, that the pod didn't tolerate.
  Warning  FailedScheduling        53m                   default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.
  Normal   Scheduled               52m                   default-scheduler  Successfully assigned kube-system/coredns-d76bd69b-6h7nj to ix-truenas
  Warning  FailedSync              29m (x5 over 30m)     kubelet            error determining status: rpc error: code = Unknown desc = Error: No such container: 438d145717f95533dc20661f4ca3259e5af73e94521a4ec2a05fbd0ec0c7781a
  Normal   SandboxChanged          17m (x9 over 34m)     kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  5m19s (x22 over 50m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "coredns-d76bd69b-6h7nj": operation timeout: context deadline exceeded
Name:                 nvidia-device-plugin-daemonset-n7fwf
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 ix-truenas/192.168.1.49
Start Time:           Tue, 21 Jun 2022 17:51:37 -0700
Labels:               controller-revision-hash=77f95bfc79
                      name=nvidia-device-plugin-ds
                      pod-template-generation=1
Annotations:          scheduler.alpha.kubernetes.io/critical-pod:
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        DaemonSet/nvidia-device-plugin-daemonset
Containers:
  nvidia-device-plugin-ctr:
    Container ID:
    Image:          nvidia/k8s-device-plugin:v0.10.0
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/kubelet/device-plugins from device-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r2vhn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  device-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/device-plugins
    HostPathType:
  kube-api-access-r2vhn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
                             nvidia.com/gpu:NoSchedule op=Exists
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               51m                   default-scheduler  Successfully assigned kube-system/nvidia-device-plugin-daemonset-n7fwf to ix-truenas
  Normal   SandboxChanged          9m51s (x13 over 34m)  kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  3m40s (x23 over 49m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "nvidia-device-plugin-daemonset-n7fwf": operation timeout: context deadline exceeded
Name:                 openebs-zfs-controller-0
Namespace:            kube-system
Priority:             900000000
Priority Class Name:  openebs-zfs-csi-controller-critical
Node:                 ix-truenas/192.168.1.49
Start Time:           Tue, 21 Jun 2022 17:51:36 -0700
Labels:               app=openebs-zfs-controller
                      controller-revision-hash=openebs-zfs-controller-698698d48b
                      openebs.io/component-name=openebs-zfs-controller
                      openebs.io/version=ci
                      role=openebs-zfs
                      statefulset.kubernetes.io/pod-name=openebs-zfs-controller-0
Annotations:          <none>
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        StatefulSet/openebs-zfs-controller
Containers:
  csi-resizer:
    Container ID:
    Image:         k8s.gcr.io/sig-storage/csi-resizer:v1.2.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --leader-election
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
  csi-snapshotter:
    Container ID:
    Image:         k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --leader-election
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
  snapshot-controller:
    Container ID:
    Image:         k8s.gcr.io/sig-storage/snapshot-controller:v4.0.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --leader-election=true
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
  csi-provisioner:
    Container ID:
    Image:         k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --feature-gates=Topology=true
      --strict-topology
      --leader-election
      --extra-create-metadata=true
      --enable-capacity=true
      --default-fstype=ext4
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ADDRESS:    /var/lib/csi/sockets/pluginproxy/csi.sock
      NAMESPACE:  kube-system (v1:metadata.namespace)
      POD_NAME:   openebs-zfs-controller-0 (v1:metadata.name)
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
  openebs-zfs-plugin:
    Container ID:
    Image:         openebs/zfs-driver:2.0.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      --endpoint=$(OPENEBS_CSI_ENDPOINT)
      --plugin=$(OPENEBS_CONTROLLER_DRIVER)
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      OPENEBS_CONTROLLER_DRIVER:    controller
      OPENEBS_CSI_ENDPOINT:         unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      OPENEBS_NAMESPACE:            openebs
      OPENEBS_IO_INSTALLER_TYPE:    zfs-operator
      OPENEBS_IO_ENABLE_ANALYTICS:  true
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jwxj2 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-jwxj2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               51m                   default-scheduler  Successfully assigned kube-system/openebs-zfs-controller-0 to ix-truenas
  Warning  FailedSync              38m (x3 over 38m)     kubelet            error determining status: rpc error: code = Unknown desc = Error: No such container: 2c97475ed30a8d3dc3c987f44517dad4720751c3a6366dc3869cbc4216141ef5
  Warning  FailedSync              23m (x3 over 23m)     kubelet            error determining status: rpc error: code = Unknown desc = Error: No such container: a81e009b4f084f12806bec335c332c6c2b168fd461909906875183083448cb12
  Normal   SandboxChanged          19m (x10 over 37m)    kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  4m47s (x22 over 49m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "openebs-zfs-controller-0": operation timeout: context deadline exceeded
root@server[~]#
The resilvering process is also taking forever and I'm assuming it's due to the fact that I've restarted the machine several times and caused more issues than it has helped.
I also seemingly managed to make over 9000 snapshots in the past day and a half alone because kubernetes keeps trying to create pods?
I've made a ticket on Jira but I'm waiting for a response and I'm really not sure what I should do. Should I leave my server running while it is spamming non stop snapshots and bogging down the system? Is there any way to delete these en masse?
My current specs are
Code:
Intel i7 5820k Gigabyte X99 UD4 (Revision 1) 102gb of of DDR4 RAM Corsair ax850 Gold Nvidia EVGA 1070 One (formerly two) shucked WD easystore (wd whites) - 14tb One WD Purple 14tb drive
Sorry for posting this on the forum as well as on the jira, it's stressing me out as I've not seen anyone else run into similar issues?
Please let me know if you have any questions about my issue!
Thanks for the help :)