From 8790101d329f5fa7e5fea1cf33aa3fe75a3760ec Mon Sep 17 00:00:00 2001 From: Stefan Reimer Date: Sun, 15 Jan 2023 14:31:17 +0000 Subject: [PATCH] Minor tweaks and doc updates --- docs/v1.24.md | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/docs/v1.24.md b/docs/v1.24.md index e76d08e..0173c2c 100644 --- a/docs/v1.24.md +++ b/docs/v1.24.md @@ -7,7 +7,7 @@ - cluster-autoscaler is enabled by default on AWS - worker nodes are now automatically update to latest AMI and config in a rolling fashion - integrated Bitnami Sealed Secrets controller - +- reduced avg. CPU load on controller nodes well below the 20% threshold to prevent extra costs from CPU credits ## Version upgrades - cilium @@ -37,15 +37,29 @@ Ensure your Kube context points to the correct cluster ! 3. Trigger cluster upgrade: `./admin/upgrade_cluster.sh ` -4. Reboot controller(s) one by one +4. Review the kubezero-config and if all looks good commit the ArgoApp resouce for Kubezero via regular git + git add / commit / push `` + * DO NOT yet re-enable ArgoCD before all pre v1.24 workers have been replaced !!! * + +5. Reboot controller(s) one by one Wait each time for controller to join and all pods running. Might take a while ... -5. Upgrade CFN stacks for the workers. +6. Upgrade CFN stacks for the workers. This in turn will trigger automated worker updates by evicting pods and launching new workers in a rolling fashion. Grab a coffee and keep an eye on the cluster to be safe ... + Depending on your cluster size it might take a while to roll over all workers! -6. If all looks good, commit the ArgoApp resouce for Kubezero, before re-enabling ArgoCD itself. - git add / commit / push `` +7. Re-enable ArgoCD by hitting on the still waiting upgrade script -7. Head over to ArgoCD and sync all KubeZero modules incl. `pruning` enabled to remove eg. Calico +8. Quickly head over to ArgoCD and sync the KubeZero main module as soon as possible to reduce potential back and forth in case ArgoCD has legacy state + + +## Known issues + +### existing EFS volumes +If pods are getting stuck in `Pending` during the worker upgrade, check the status of any EFS PVC. +In case any PVC is in status `Lost`, edit the PVC and remove the following annotation: +``` pv.kubernetes.io/bind-completed: "yes" ``` +This will instantly rebind the PVC to its PV and allow the pods to migrate. +Going to be fixed during the v1.25 cycle by a planned rework of the EFS storage module.