1.8 KiB

KubeZero 1.23

What's new - Major themes

  • Cilium added as second CNI to prepare full migration to Cilium with 1.24 upgrade
  • support for Nvidia g5 instances incl. pre-installed kernel drivers, cudo toolchain and CRI intergration
  • updated inf1 neuron drivers
  • ExtendedResourceToleration AdmissionController and auto-taints allowing Neuron and Nvidia pods ONLY to be scheduled on dedicated workers
  • full Cluster-Autoscaler integration

Version upgrades

  • Istio to 1.14.4
  • Logging: ECK operator to 2.4, fluent-bit 1.9.8
  • Metrics: Prometheus and all Grafana charts to latest to match V1.23
  • ArgoCD to V2.4 ( access to pod via shell disabled by default )
  • AWS EBS/EFS CSI drivers to latest versions
  • cert-manager to V1.9.1


(No, really, you MUST read this before you upgrade)

  • Ensure your Kube context points to the correct cluster !
  1. Enable containerProxy for NAT instances and upgrade NAT instance using the new V2 Pulumi stacks

  2. Review CFN config for controller and workers ( enable containerProxy, remove legacy version settings etc )

  3. Upgrade CFN stacks for the control plane and all worker groups

  4. Trigger fully-automated cluster upgrade:
    ./admin/upgrade_cluster.sh <path to the argocd app kubezero yaml for THIS cluster>

  5. Reboot controller(s) one by one
    Wait each time for controller to join and all pods running. Might take a while ...

  6. Launch new set of workers eg. by doubling desired for each worker ASG
    once new workers are ready, cordon and drain all old workers
    The cluster-autoscaler will remove the old workers automatically after about 10min !

  7. If all looks good, commit the ArgoApp resouce for Kubezero, before re-enabling ArgoCD itself.
    git add / commit / push <cluster/env/kubezero/application.yaml>