42 KiB
AWS Node Termination Handler
AWS Node Termination Handler Helm chart for Kubernetes. For more information on this project see the project repo at github.com/aws/aws-node-termination-handler.
Prerequisites
- Kubernetes >= v1.16
Installing the Chart
Before you can install the chart you will need to authenticate your Helm client.
aws ecr-public get-login-password \
--region us-east-1 | helm registry login \
--username AWS \
--password-stdin public.ecr.aws
Once the helm registry login succeeds, use the following command to install the chart with the release name aws-node-termination-handler
and the default configuration to the kube-system
namespace. In the below command, add the CHART_VERSION that you want to install.
helm upgrade --install --namespace kube-system aws-node-termination-handler oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION
To install the chart on an EKS cluster where the AWS Node Termination Handler is already installed, you can run the following command.
helm upgrade --install --namespace kube-system aws-node-termination-handler oci://public.ecr.aws/aws-ec2/helm/aws-node-termination-handler --version $CHART_VERSION --recreate-pods --force
If you receive an error similar to the one below simply rerun the above command.
Error: release aws-node-termination-handler failed: "aws-node-termination-handler" already exists
To uninstall the aws-node-termination-handler
chart installation from the kube-system
namespace run the following command.
helm uninstall --namespace kube-system aws-node-termination-handler
Configuration
The following tables lists the configurable parameters of the chart and their default values. These values are split up into the common configuration shared by all AWS Node Termination Handler modes, queue configuration used when AWS Node Termination Handler is in in queue-processor mode, and IMDS configuration used when AWS Node Termination Handler is in IMDS mode; for more information about the different modes see the project README.
Common Configuration
The configuration in this table applies to all AWS Node Termination Handler modes.
Parameter | Description | Default |
---|---|---|
image.repository |
Image repository. | public.ecr.aws/aws-ec2/aws-node-termination-handler |
image.tag |
Image tag. | v{{ .Chart.AppVersion}} |
image.pullPolicy |
Image pull policy. | IfNotPresent |
image.pullSecrets |
Image pull secrets. | [] |
nameOverride |
Override the name of the chart. |
"" |
fullnameOverride |
Override the fullname of the chart. |
"" |
serviceAccount.create |
If true , create a new service account. |
true |
serviceAccount.name |
Service account to be used. If not set and serviceAccount.create is true , a name is generated using the full name template. |
nil |
serviceAccount.annotations |
Annotations to add to the service account. | {} |
rbac.create |
If true , create the RBAC resources. |
true |
rbac.pspEnabled |
If true , create a pod security policy resource. Note: PodSecurityPolicy s will not be created when Kubernetes version is 1.25 or later. |
true |
customLabels |
Labels to add to all resource metadata. | {} |
podLabels |
Labels to add to the pod. | {} |
podAnnotations |
Annotations to add to the pod. | {} |
podSecurityContext |
Security context for the pod. | See values.yaml |
securityContext |
Security context for the aws-node-termination-handler container. | See values.yaml |
terminationGracePeriodSeconds |
The termination grace period for the pod. | nil |
resources |
Resource requests and limits for the aws-node-termination-handler container. | {} |
nodeSelector |
Expressions to select a node by it's labels for pod assignment. In IMDS mode this has a higher priority than daemonsetNodeSelector (for backwards compatibility) but shouldn't be used. |
{} |
affinity |
Affinity settings for pod assignment. In IMDS mode this has a higher priority than daemonsetAffinity (for backwards compatibility) but shouldn't be used. |
{} |
tolerations |
Tolerations for pod assignment. In IMDS mode this has a higher priority than daemonsetTolerations (for backwards compatibility) but shouldn't be used. |
[] |
extraEnv |
Additional environment variables for the aws-node-termination-handler container. | [] |
probes |
The Kubernetes liveness probe configuration. | See values.yaml |
logLevel |
Sets the log level (info ,debug , or error ) |
info |
logFormatVersion |
Sets the log format version. Available versions: 1, 2. Version 1 refers to the format that has been used through v1.17.3. Version 2 offers more detail for the "event kind" and "reason", especially when operating in Queue Processor mode. | 1 |
jsonLogging |
If true , use JSON-formatted logs instead of human readable logs. |
false |
enablePrometheusServer |
If true , start an http server exposing /metrics endpoint for Prometheus. |
false |
prometheusServerPort |
Replaces the default HTTP port for exposing Prometheus metrics. | 9092 |
dryRun |
If true , only log if a node would be drained. |
false |
cordonOnly |
If true , nodes will be cordoned but not drained when an interruption event occurs. |
false |
taintNode |
If true , nodes will be tainted when an interruption event occurs. Currently used taint keys are aws-node-termination-handler/scheduled-maintenance , aws-node-termination-handler/spot-itn , aws-node-termination-handler/asg-lifecycle-termination and aws-node-termination-handler/rebalance-recommendation . |
false |
excludeFromLoadBalancers |
If true , nodes will be marked for exclusion from load balancers before they are cordoned. This applies the node.kubernetes.io/exclude-from-external-load-balancers label to enable the ServiceNodeExclusion feature gate. The label will not be modified or removed for nodes that already have it. |
false |
deleteLocalData |
If true , continue even if there are pods using local data that will be deleted when the node is drained. |
true |
ignoreDaemonSets |
If true , skip terminating daemon set managed pods. |
true |
podTerminationGracePeriod |
The time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used, which defaults to 30 seconds if not specified for the pod. | -1 |
nodeTerminationGracePeriod |
Period of time in seconds given to each node to terminate gracefully. Node draining will be scheduled based on this value to optimize the amount of compute time, but still safely drain the node before an event. | 120 |
emitKubernetesEvents |
If true , Kubernetes events will be emitted when interruption events are received and when actions are taken on Kubernetes nodes. In IMDS Processor mode a default set of annotations with all the node metadata gathered from IMDS will be attached to each event. More information here. |
false |
completeLifecycleActionDelaySeconds |
Pause after draining the node before completing the EC2 Autoscaling lifecycle action. This may be helpful if Pods on the node have Persistent Volume Claims. | -1 |
kubernetesEventsExtraAnnotations |
A comma-separated list of key=value extra annotations to attach to all emitted Kubernetes events (e.g. first=annotation,sample.annotation/number=two" ). |
"" |
webhookURL |
Posts event data to URL upon instance interruption action. | "" |
webhookURLSecretName |
Pass the webhook URL as a Secret using the key webhookurl . |
"" |
webhookHeaders |
Replace the default webhook headers (e.g. {"Content-type":"application/json"} ). |
"" |
webhookProxy |
Uses the specified HTTP(S) proxy for sending webhook data. | "" |
webhookTemplate |
Replaces the default webhook message template (e.g. {"text":"[NTH][Instance Interruption] EventID: {{ .EventID }} - Kind: {{ .Kind }} - Instance: {{ .InstanceID }} - Node: {{ .NodeName }} - Description: {{ .Description }} - Start Time: {{ .StartTime }}"} ). |
"" |
webhookTemplateConfigMapName |
Pass the webhook template file as a configmap. | "``" |
webhookTemplateConfigMapKey |
Name of the Configmap key storing the template file. | "" |
enableSqsTerminationDraining |
If true , this turns on queue-processor mode which drains nodes when an SQS termination event is received. |
false |
Queue-Processor Mode Configuration
The configuration in this table applies to AWS Node Termination Handler in queue-processor mode.
Parameter | Description | Default |
---|---|---|
replicas |
The number of replicas in the deployment when using queue-processor mode (NOTE: increasing replicas may cause duplicate webhooks since pods are stateless). | 1 |
strategy |
Specify the update strategy for the deployment. | {} |
podDisruptionBudget |
Limit the disruption for controller pods, requires at least 2 controller replicas. | {} |
serviceMonitor.create |
If true , create a ServiceMonitor. This requires enablePrometheusServer: true . |
false |
serviceMonitor.namespace |
Override ServiceMonitor Helm release namespace. | nil |
serviceMonitor.labels |
Additional ServiceMonitor metadata labels. | {} |
serviceMonitor.interval |
Prometheus scrape interval. | 30s |
serviceMonitor.sampleLimit |
Number of scraped samples accepted. | 5000 |
priorityClassName |
Name of the PriorityClass to use for the Deployment. | system-cluster-critical |
awsRegion |
If specified, use the AWS region for AWS API calls, else NTH will try to find the region through the AWS_REGION environment variable, IMDS, or the specified queue URL. |
"" |
queueURL |
Listens for messages on the specified SQS queue URL. | "" |
workers |
The maximum amount of parallel event processors to handle concurrent events. | 10 |
checkTagBeforeDraining |
If true , check that the instance is tagged with the managedTag before draining the node. |
true |
managedTag |
The node tag to check if checkTagBeforeDraining is true . |
aws-node-termination-handler/managed |
checkASGTagBeforeDraining |
[DEPRECATED](Use checkTagBeforeDraining instead) If true , check that the instance is tagged with the managedAsgTag before draining the node. If false , disables calls ASG API. |
true |
managedAsgTag |
[DEPRECATED](Use managedTag instead) The node tag to check if checkASGTagBeforeDraining is true . |
|
useProviderId |
If true , fetch node name through Kubernetes node spec ProviderID instead of AWS event PrivateDnsHostname. |
false |
IMDS Mode Configuration
The configuration in this table applies to AWS Node Termination Handler in IMDS mode.
Parameter | Description | Default |
---|---|---|
targetNodeOs |
Space separated list of node OS's to target (e.g. "linux" , "windows" , "linux windows" ). Windows support is EXPERIMENTAL. |
"linux" |
linuxPodLabels |
Labels to add to each Linux pod. | {} |
windowsPodLabels |
Labels to add to each Windows pod. | {} |
linuxPodAnnotations |
Annotations to add to each Linux pod. | {} |
windowsPodAnnotations |
Annotations to add to each Windows pod. | {} |
updateStrategy |
Update strategy for the all DaemonSets. | See values.yaml |
daemonsetPriorityClassName |
Name of the PriorityClass to use for all DaemonSets. | system-node-critical |
podMonitor.create |
If true , create a PodMonitor. This requires enablePrometheusServer: true . |
false |
podMonitor.namespace |
Override PodMonitor Helm release namespace. | nil |
podMonitor.labels |
Additional PodMonitor metadata labels | {} |
podMonitor.interval |
Prometheus scrape interval. | 30s |
podMonitor.sampleLimit |
Number of scraped samples accepted. | 5000 |
useHostNetwork |
If true , enables hostNetwork for the Linux DaemonSet. NOTE: setting this to false may cause issues accessing IMDSv2 if your account is not configured with an IP hop count of 2 see Metrics Endpoint Considerations |
true |
dnsPolicy |
If specified, this overrides linuxDnsPolicy and windowsDnsPolicy with a single policy. |
"" |
dnsConfig |
If specified, this sets the dnsConfig: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config | {} |
linuxDnsPolicy |
DNS policy for the Linux DaemonSet. | "" |
windowsDnsPolicy |
DNS policy for the Windows DaemonSet. | "" |
daemonsetNodeSelector |
Expressions to select a node by it's labels for DaemonSet pod assignment. For backwards compatibility the nodeSelector value has priority over this but shouldn't be used. |
{} |
linuxNodeSelector |
Override daemonsetNodeSelector for the Linux DaemonSet. |
{} |
windowsNodeSelector |
Override daemonsetNodeSelector for the Windows DaemonSet. |
{} |
daemonsetAffinity |
Affinity settings for DaemonSet pod assignment. For backwards compatibility the affinity has priority over this but shouldn't be used. |
{} |
linuxAffinity |
Override daemonsetAffinity for the Linux DaemonSet. |
{} |
windowsAffinity |
Override daemonsetAffinity for the Windows DaemonSet. |
{} |
daemonsetTolerations |
Tolerations for DaemonSet pod assignment. For backwards compatibility the tolerations has priority over this but shouldn't be used. |
[] |
linuxTolerations |
Override daemonsetTolerations for the Linux DaemonSet. |
[] |
windowsTolerations |
Override daemonsetTolerations for the Linux DaemonSet. |
[] |
enableProbesServer |
If true , start an http server exposing /healthz endpoint for probes. |
false |
metadataTries |
The number of times to try requesting metadata. | 3 |
enableSpotInterruptionDraining |
If true , drain nodes when the spot interruption termination notice is received. Only used in IMDS mode. |
true |
enableScheduledEventDraining |
If true , drain nodes before the maintenance window starts for an EC2 instance scheduled event. Only used in IMDS mode. |
true |
enableRebalanceMonitoring |
If true , cordon nodes when the rebalance recommendation notice is received. If you'd like to drain the node in addition to cordoning, then also set enableRebalanceDraining . Only used in IMDS mode. |
false |
enableRebalanceDraining |
If true , drain nodes when the rebalance recommendation notice is received. Only used in IMDS mode. |
false |
deleteSqsMsgIfNodeNotFound |
If true , delete the SQS Message from the SQS Queue if the targeted node is not found. Only used in Queue Processor mode. |
false |
Testing Configuration
The configuration in this table applies to AWS Node Termination Handler testing and is NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS.
Parameter | Description | Default |
---|---|---|
awsEndpoint |
(Used for testing) If specified, use the provided AWS endpoint to make API calls. | "" |
awsSecretAccessKey |
(Used for testing) Pass-thru environment variable. | nil |
awsAccessKeyID |
(Used for testing) Pass-thru environment variable. | nil |
instanceMetadataURL |
(Used for testing) If specified, use the provided metadata URL. | "" |
procUptimeFile |
(Used for Testing) Specify the uptime file. | /proc/uptime |
Metrics Endpoint Considerations
AWS Node Termination HAndler in IMDS mode runs as a DaemonSet with useHostNetwork: true
by default. If the Prometheus server is enabled with enablePrometheusServer: true
nothing else will be able to bind to the configured port (by default prometheusServerPort: 9092
) in the root network namespace. Therefore, it will need to have a firewall/security group configured on the nodes to block access to the /metrics
endpoint.
You can switch NTH in IMDS mode to run w/ useHostNetwork: false
, but you will need to make sure that IMDSv1 is enabled or IMDSv2 IP hop count will need to be incremented to 2 (see the IMDSv2 documentation.