Minor tweaks and doc updates

This commit is contained in:
Stefan Reimer 2023-01-15 14:31:17 +00:00
parent 99c1d54790
commit 8790101d32
1 changed files with 20 additions and 6 deletions

View File

@ -7,7 +7,7 @@
- cluster-autoscaler is enabled by default on AWS
- worker nodes are now automatically update to latest AMI and config in a rolling fashion
- integrated Bitnami Sealed Secrets controller
- reduced avg. CPU load on controller nodes well below the 20% threshold to prevent extra costs from CPU credits
## Version upgrades
- cilium
@ -37,15 +37,29 @@ Ensure your Kube context points to the correct cluster !
3. Trigger cluster upgrade:
`./admin/upgrade_cluster.sh <path to the argocd app kubezero yaml for THIS cluster>`
4. Reboot controller(s) one by one
4. Review the kubezero-config and if all looks good commit the ArgoApp resouce for Kubezero via regular git
git add / commit / push `<cluster/env/kubezero/application.yaml>`
* DO NOT yet re-enable ArgoCD before all pre v1.24 workers have been replaced !!! *
5. Reboot controller(s) one by one
Wait each time for controller to join and all pods running.
Might take a while ...
5. Upgrade CFN stacks for the workers.
6. Upgrade CFN stacks for the workers.
This in turn will trigger automated worker updates by evicting pods and launching new workers in a rolling fashion.
Grab a coffee and keep an eye on the cluster to be safe ...
Depending on your cluster size it might take a while to roll over all workers!
6. If all looks good, commit the ArgoApp resouce for Kubezero, before re-enabling ArgoCD itself.
git add / commit / push `<cluster/env/kubezero/application.yaml>`
7. Re-enable ArgoCD by hitting <return> on the still waiting upgrade script
7. Head over to ArgoCD and sync all KubeZero modules incl. `pruning` enabled to remove eg. Calico
8. Quickly head over to ArgoCD and sync the KubeZero main module as soon as possible to reduce potential back and forth in case ArgoCD has legacy state
## Known issues
### existing EFS volumes
If pods are getting stuck in `Pending` during the worker upgrade, check the status of any EFS PVC.
In case any PVC is in status `Lost`, edit the PVC and remove the following annotation:
``` pv.kubernetes.io/bind-completed: "yes" ```
This will instantly rebind the PVC to its PV and allow the pods to migrate.
Going to be fixed during the v1.25 cycle by a planned rework of the EFS storage module.