# KubeZero 1.23 ## What's new - Major themes - Cilium added as second CNI to prepare full migration to Cilium with 1.24 upgrade - support for Nvidia g5 instances incl. pre-installed kernel drivers, cudo toolchain and CRI intergration - updated inf1 neuron drivers - ExtendedResourceToleration AdmissionController and auto-taints allowing Neuron and Nvidia pods ONLY to be scheduled on dedicated workers - full Cluster-Autoscaler integration ## Version upgrades - Istio to 1.14.4 - Logging: ECK operator to 2.4, fluent-bit 1.9.8 - Metrics: Prometheus and all Grafana charts to latest to match V1.23 - ArgoCD to V2.4 ( access to pod via shell disabled by default ) - AWS EBS/EFS CSI drivers to latest versions - cert-manager to V1.9.1 # Upgrade `(No, really, you MUST read this before you upgrade)` - Ensure your Kube context points to the correct cluster ! 1. Enable `containerProxy` for NAT instances and upgrade NAT instance using the new V2 Pulumi stacks 2. Review CFN config for controller and workers ( enable containerProxy, remove legacy version settings etc ) 3. Upgrade CFN stacks for the control plane and all worker groups 4. Trigger fully-automated cluster upgrade: `./admin/upgrade_cluster.sh ` 5. Reboot controller(s) one by one Wait each time for controller to join and all pods running. Might take a while ... 6. Launch new set of workers eg. by doubling `desired` for each worker ASG once new workers are ready, cordon and drain all old workers The cluster-autoscaler will remove the old workers automatically after about 10min ! 7. If all looks good, commit the ArgoApp resouce for Kubezero, before re-enabling ArgoCD itself. git add / commit / push ``