From 8790101d329f5fa7e5fea1cf33aa3fe75a3760ec Mon Sep 17 00:00:00 2001
From: Stefan Reimer <stefan@zero-downtime.net>
Date: Sun, 15 Jan 2023 14:31:17 +0000
Subject: [PATCH] Minor tweaks and doc updates

---
 docs/v1.24.md | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/docs/v1.24.md b/docs/v1.24.md
index e76d08e..0173c2c 100644
--- a/docs/v1.24.md
+++ b/docs/v1.24.md
@@ -7,7 +7,7 @@
 - cluster-autoscaler is enabled by default on AWS
 - worker nodes are now automatically update to latest AMI and config in a rolling fashion
 - integrated Bitnami Sealed Secrets controller
-
+- reduced avg. CPU load on controller nodes well below the 20% threshold to prevent extra costs from CPU credits
 
 ## Version upgrades
 - cilium
@@ -37,15 +37,29 @@ Ensure your Kube context points to the correct cluster !
 3. Trigger cluster upgrade:  
   `./admin/upgrade_cluster.sh <path to the argocd app kubezero yaml for THIS cluster>`
 
-4. Reboot controller(s) one by one  
+4. Review the kubezero-config and if all looks good commit the ArgoApp resouce for Kubezero via regular git  
+  git add / commit / push `<cluster/env/kubezero/application.yaml>`
+  * DO NOT yet re-enable ArgoCD before all pre v1.24 workers have been replaced !!! *
+
+5. Reboot controller(s) one by one  
 Wait each time for controller to join and all pods running.
 Might take a while ...
 
-5. Upgrade CFN stacks for the workers.  
+6. Upgrade CFN stacks for the workers.  
   This in turn will trigger automated worker updates by evicting pods and launching new workers in a rolling fashion.
   Grab a coffee and keep an eye on the cluster to be safe ...
+  Depending on your cluster size it might take a while to roll over all workers!
 
-6. If all looks good, commit the ArgoApp resouce for Kubezero, before re-enabling ArgoCD itself.  
-  git add / commit / push `<cluster/env/kubezero/application.yaml>`
+7. Re-enable ArgoCD by hitting <return> on the still waiting upgrade script 
 
-7. Head over to ArgoCD and sync all KubeZero modules incl. `pruning` enabled to remove eg. Calico
+8. Quickly head over to ArgoCD and sync the KubeZero main module as soon as possible to reduce potential back and forth in case ArgoCD has legacy state
+
+
+## Known issues
+
+### existing EFS volumes
+If pods are getting stuck in `Pending` during the worker upgrade, check the status of any EFS PVC.
+In case any PVC is in status `Lost`, edit the PVC and remove the following annotation:
+``` pv.kubernetes.io/bind-completed: "yes" ```
+This will instantly rebind the PVC to its PV and allow the pods to migrate.  
+Going to be fixed during the v1.25 cycle by a planned rework of the EFS storage module.