KubeZero/charts/kubezero-addons/charts/aws-node-termination-handler/README.md

42 KiB

AWS Node Termination Handler

AWS Node Termination Handler Helm chart for Kubernetes. For more information on this project see the project repo at github.com/aws/aws-node-termination-handler.

Prerequisites

  • Kubernetes >= v1.16

Installing the Chart

Before you can install the chart you will need to authenticate your Helm client.

aws ecr-public get-login-password \
     --region us-east-1 | helm registry login \
     --username AWS \
     --password-stdin public.ecr.aws

Once the helm registry login succeeds, use the following command to install the chart with the release name aws-node-termination-handler and the default configuration to the kube-system namespace. In the below command, add the CHART_VERSION that you want to install.

helm upgrade --install --namespace kube-system aws-node-termination-handler oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION

To install the chart on an EKS cluster where the AWS Node Termination Handler is already installed, you can run the following command.

helm upgrade --install --namespace kube-system aws-node-termination-handler oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION --recreate-pods --force

If you receive an error similar to the one below simply rerun the above command.

Error: release aws-node-termination-handler failed: "aws-node-termination-handler" already exists

To uninstall the aws-node-termination-handler chart installation from the kube-system namespace run the following command.

helm uninstall --namespace kube-system aws-node-termination-handler

Configuration

The following tables lists the configurable parameters of the chart and their default values. These values are split up into the common configuration shared by all AWS Node Termination Handler modes, queue configuration used when AWS Node Termination Handler is in in queue-processor mode, and IMDS configuration used when AWS Node Termination Handler is in IMDS mode; for more information about the different modes see the project README.

Common Configuration

The configuration in this table applies to all AWS Node Termination Handler modes.

Parameter Description Default
image.repository Image repository. public.ecr.aws/aws-ec2/aws-node-termination-handler
image.tag Image tag. v{{ .Chart.AppVersion}}
image.pullPolicy Image pull policy. IfNotPresent
image.pullSecrets Image pull secrets. []
nameOverride Override the name of the chart. ""
fullnameOverride Override the fullname of the chart. ""
serviceAccount.create If true, create a new service account. true
serviceAccount.name Service account to be used. If not set and serviceAccount.create is true, a name is generated using the full name template. nil
serviceAccount.annotations Annotations to add to the service account. {}
rbac.create If true, create the RBAC resources. true
rbac.pspEnabled If true, create a pod security policy resource. Note: PodSecurityPolicys will not be created when Kubernetes version is 1.25 or later. true
customLabels Labels to add to all resource metadata. {}
podLabels Labels to add to the pod. {}
podAnnotations Annotations to add to the pod. {}
podSecurityContext Security context for the pod. See values.yaml
securityContext Security context for the aws-node-termination-handler container. See values.yaml
terminationGracePeriodSeconds The termination grace period for the pod. nil
resources Resource requests and limits for the aws-node-termination-handler container. {}
nodeSelector Expressions to select a node by it's labels for pod assignment. In IMDS mode this has a higher priority than daemonsetNodeSelector (for backwards compatibility) but shouldn't be used. {}
affinity Affinity settings for pod assignment. In IMDS mode this has a higher priority than daemonsetAffinity (for backwards compatibility) but shouldn't be used. {}
tolerations Tolerations for pod assignment. In IMDS mode this has a higher priority than daemonsetTolerations (for backwards compatibility) but shouldn't be used. []
extraEnv Additional environment variables for the aws-node-termination-handler container. []
probes The Kubernetes liveness probe configuration. See values.yaml
logLevel Sets the log level (info,debug, or error) info
logFormatVersion Sets the log format version. Available versions: 1, 2. Version 1 refers to the format that has been used through v1.17.3. Version 2 offers more detail for the "event kind" and "reason", especially when operating in Queue Processor mode. 1
jsonLogging If true, use JSON-formatted logs instead of human readable logs. false
enablePrometheusServer If true, start an http server exposing /metrics endpoint for Prometheus. false
prometheusServerPort Replaces the default HTTP port for exposing Prometheus metrics. 9092
dryRun If true, only log if a node would be drained. false
cordonOnly If true, nodes will be cordoned but not drained when an interruption event occurs. false
taintNode If true, nodes will be tainted when an interruption event occurs. Currently used taint keys are aws-node-termination-handler/scheduled-maintenance, aws-node-termination-handler/spot-itn, aws-node-termination-handler/asg-lifecycle-termination and aws-node-termination-handler/rebalance-recommendation. false
excludeFromLoadBalancers If true, nodes will be marked for exclusion from load balancers before they are cordoned. This applies the node.kubernetes.io/exclude-from-external-load-balancers label to enable the ServiceNodeExclusion feature gate. The label will not be modified or removed for nodes that already have it. false
deleteLocalData If true, continue even if there are pods using local data that will be deleted when the node is drained. true
ignoreDaemonSets If true, skip terminating daemon set managed pods. true
podTerminationGracePeriod The time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used, which defaults to 30 seconds if not specified for the pod. -1
nodeTerminationGracePeriod Period of time in seconds given to each node to terminate gracefully. Node draining will be scheduled based on this value to optimize the amount of compute time, but still safely drain the node before an event. 120
emitKubernetesEvents If true, Kubernetes events will be emitted when interruption events are received and when actions are taken on Kubernetes nodes. In IMDS Processor mode a default set of annotations with all the node metadata gathered from IMDS will be attached to each event. More information here. false
completeLifecycleActionDelaySeconds Pause after draining the node before completing the EC2 Autoscaling lifecycle action. This may be helpful if Pods on the node have Persistent Volume Claims. -1
kubernetesEventsExtraAnnotations A comma-separated list of key=value extra annotations to attach to all emitted Kubernetes events (e.g. first=annotation,sample.annotation/number=two"). ""
webhookURL Posts event data to URL upon instance interruption action. ""
webhookURLSecretName Pass the webhook URL as a Secret using the key webhookurl. ""
webhookHeaders Replace the default webhook headers (e.g. {"Content-type":"application/json"}). ""
webhookProxy Uses the specified HTTP(S) proxy for sending webhook data. ""
webhookTemplate Replaces the default webhook message template (e.g. {"text":"[NTH][Instance Interruption] EventID: {{ .EventID }} - Kind: {{ .Kind }} - Instance: {{ .InstanceID }} - Node: {{ .NodeName }} - Description: {{ .Description }} - Start Time: {{ .StartTime }}"}). ""
webhookTemplateConfigMapName Pass the webhook template file as a configmap. "``"
webhookTemplateConfigMapKey Name of the Configmap key storing the template file. ""
enableSqsTerminationDraining If true, this turns on queue-processor mode which drains nodes when an SQS termination event is received. false

Queue-Processor Mode Configuration

The configuration in this table applies to AWS Node Termination Handler in queue-processor mode.

Parameter Description Default
replicas The number of replicas in the deployment when using queue-processor mode (NOTE: increasing replicas may cause duplicate webhooks since pods are stateless). 1
strategy Specify the update strategy for the deployment. {}
podDisruptionBudget Limit the disruption for controller pods, requires at least 2 controller replicas. {}
serviceMonitor.create If true, create a ServiceMonitor. This requires enablePrometheusServer: true. false
serviceMonitor.namespace Override ServiceMonitor Helm release namespace. nil
serviceMonitor.labels Additional ServiceMonitor metadata labels. {}
serviceMonitor.interval Prometheus scrape interval. 30s
serviceMonitor.sampleLimit Number of scraped samples accepted. 5000
priorityClassName Name of the PriorityClass to use for the Deployment. system-cluster-critical
awsRegion If specified, use the AWS region for AWS API calls, else NTH will try to find the region through the AWS_REGION environment variable, IMDS, or the specified queue URL. ""
queueURL Listens for messages on the specified SQS queue URL. ""
workers The maximum amount of parallel event processors to handle concurrent events. 10
checkTagBeforeDraining If true, check that the instance is tagged with the managedTag before draining the node. true
managedTag The node tag to check if checkTagBeforeDraining is true. aws-node-termination-handler/managed
checkASGTagBeforeDraining [DEPRECATED](Use checkTagBeforeDraining instead) If true, check that the instance is tagged with the managedAsgTag before draining the node. If false, disables calls ASG API. true
managedAsgTag [DEPRECATED](Use managedTag instead) The node tag to check if checkASGTagBeforeDraining is true.
useProviderId If true, fetch node name through Kubernetes node spec ProviderID instead of AWS event PrivateDnsHostname. false

IMDS Mode Configuration

The configuration in this table applies to AWS Node Termination Handler in IMDS mode.

Parameter Description Default
targetNodeOs Space separated list of node OS's to target (e.g. "linux", "windows", "linux windows"). Windows support is EXPERIMENTAL. "linux"
linuxPodLabels Labels to add to each Linux pod. {}
windowsPodLabels Labels to add to each Windows pod. {}
linuxPodAnnotations Annotations to add to each Linux pod. {}
windowsPodAnnotations Annotations to add to each Windows pod. {}
updateStrategy Update strategy for the all DaemonSets. See values.yaml
daemonsetPriorityClassName Name of the PriorityClass to use for all DaemonSets. system-node-critical
podMonitor.create If true, create a PodMonitor. This requires enablePrometheusServer: true. false
podMonitor.namespace Override PodMonitor Helm release namespace. nil
podMonitor.labels Additional PodMonitor metadata labels {}
podMonitor.interval Prometheus scrape interval. 30s
podMonitor.sampleLimit Number of scraped samples accepted. 5000
useHostNetwork If true, enables hostNetwork for the Linux DaemonSet. NOTE: setting this to false may cause issues accessing IMDSv2 if your account is not configured with an IP hop count of 2 see Metrics Endpoint Considerations true
dnsPolicy If specified, this overrides linuxDnsPolicy and windowsDnsPolicy with a single policy. ""
dnsConfig If specified, this sets the dnsConfig: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config {}
linuxDnsPolicy DNS policy for the Linux DaemonSet. ""
windowsDnsPolicy DNS policy for the Windows DaemonSet. ""
daemonsetNodeSelector Expressions to select a node by it's labels for DaemonSet pod assignment. For backwards compatibility the nodeSelector value has priority over this but shouldn't be used. {}
linuxNodeSelector Override daemonsetNodeSelector for the Linux DaemonSet. {}
windowsNodeSelector Override daemonsetNodeSelector for the Windows DaemonSet. {}
daemonsetAffinity Affinity settings for DaemonSet pod assignment. For backwards compatibility the affinity has priority over this but shouldn't be used. {}
linuxAffinity Override daemonsetAffinity for the Linux DaemonSet. {}
windowsAffinity Override daemonsetAffinity for the Windows DaemonSet. {}
daemonsetTolerations Tolerations for DaemonSet pod assignment. For backwards compatibility the tolerations has priority over this but shouldn't be used. []
linuxTolerations Override daemonsetTolerations for the Linux DaemonSet. []
windowsTolerations Override daemonsetTolerations for the Linux DaemonSet. []
enableProbesServer If true, start an http server exposing /healthz endpoint for probes. false
metadataTries The number of times to try requesting metadata. 3
enableSpotInterruptionDraining If true, drain nodes when the spot interruption termination notice is received. Only used in IMDS mode. true
enableScheduledEventDraining If true, drain nodes before the maintenance window starts for an EC2 instance scheduled event. Only used in IMDS mode. true
enableRebalanceMonitoring If true, cordon nodes when the rebalance recommendation notice is received. If you'd like to drain the node in addition to cordoning, then also set enableRebalanceDraining. Only used in IMDS mode. false
enableRebalanceDraining If true, drain nodes when the rebalance recommendation notice is received. Only used in IMDS mode. false
deleteSqsMsgIfNodeNotFound If true, delete the SQS Message from the SQS Queue if the targeted node is not found. Only used in Queue Processor mode. false

Testing Configuration

The configuration in this table applies to AWS Node Termination Handler testing and is NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS.

Parameter Description Default
awsEndpoint (Used for testing) If specified, use the provided AWS endpoint to make API calls. ""
awsSecretAccessKey (Used for testing) Pass-thru environment variable. nil
awsAccessKeyID (Used for testing) Pass-thru environment variable. nil
instanceMetadataURL (Used for testing) If specified, use the provided metadata URL. ""
procUptimeFile (Used for Testing) Specify the uptime file. /proc/uptime

Metrics Endpoint Considerations

AWS Node Termination HAndler in IMDS mode runs as a DaemonSet with useHostNetwork: true by default. If the Prometheus server is enabled with enablePrometheusServer: true nothing else will be able to bind to the configured port (by default prometheusServerPort: 9092) in the root network namespace. Therefore, it will need to have a firewall/security group configured on the nodes to block access to the /metrics endpoint.

You can switch NTH in IMDS mode to run w/ useHostNetwork: false, but you will need to make sure that IMDSv1 is enabled or IMDSv2 IP hop count will need to be incremented to 2 (see the IMDSv2 documentation.