43 lines
1.8 KiB
Markdown
43 lines
1.8 KiB
Markdown
# KubeZero 1.23
|
|
|
|
## What's new - Major themes
|
|
|
|
- Cilium added as second CNI to prepare full migration to Cilium with 1.24 upgrade
|
|
- support for Nvidia g5 instances incl. pre-installed kernel drivers, cudo toolchain and CRI intergration
|
|
- updated inf1 neuron drivers
|
|
- ExtendedResourceToleration AdmissionController and auto-taints allowing Neuron and Nvidia pods ONLY to be scheduled on dedicated workers
|
|
- full Cluster-Autoscaler integration
|
|
|
|
## Version upgrades
|
|
- Istio to 1.14.4
|
|
- Logging: ECK operator to 2.4, fluent-bit 1.9.8
|
|
- Metrics: Prometheus and all Grafana charts to latest to match V1.23
|
|
- ArgoCD to V2.4 ( access to pod via shell disabled by default )
|
|
- AWS EBS/EFS CSI drivers to latest versions
|
|
- cert-manager to V1.9.1
|
|
|
|
# Upgrade
|
|
`(No, really, you MUST read this before you upgrade)`
|
|
|
|
- Ensure your Kube context points to the correct cluster !
|
|
|
|
1. Enable `containerProxy` for NAT instances and upgrade NAT instance using the new V2 Pulumi stacks
|
|
|
|
2. Review CFN config for controller and workers ( enable containerProxy, remove legacy version settings etc )
|
|
|
|
3. Upgrade CFN stacks for the control plane and all worker groups
|
|
|
|
4. Trigger fully-automated cluster upgrade:
|
|
`./admin/upgrade_cluster.sh <path to the argocd app kubezero yaml for THIS cluster>`
|
|
|
|
5. Reboot controller(s) one by one
|
|
Wait each time for controller to join and all pods running.
|
|
Might take a while ...
|
|
|
|
6. Launch new set of workers eg. by doubling `desired` for each worker ASG
|
|
once new workers are ready, cordon and drain all old workers
|
|
The cluster-autoscaler will remove the old workers automatically after about 10min !
|
|
|
|
7. If all looks good, commit the ArgoApp resouce for Kubezero, before re-enabling ArgoCD itself.
|
|
git add / commit / push `<cluster/env/kubezero/application.yaml>`
|