42 KiB

Raw Blame History

AWS Node Termination Handler

AWS Node Termination Handler Helm chart for Kubernetes. For more information on this project see the project repo at github.com/aws/aws-node-termination-handler.

Prerequisites

Kubernetes >= v1.16

Installing the Chart

Before you can install the chart you will need to add the aws repo to Helm.

helm repo add eks https://aws.github.io/eks-charts/

After you've installed the repo you can install the chart, the following command will install the chart with the release name aws-node-termination-handler and the default configuration to the kube-system namespace.

helm upgrade --install --namespace kube-system aws-node-termination-handler eks/aws-node-termination-handler

To install the chart on an EKS cluster where the AWS Node Termination Handler is already installed, you can run the following command.

helm upgrade --install --namespace kube-system aws-node-termination-handler eks/aws-node-termination-handler --recreate-pods --force

If you receive an error similar to the one below simply rerun the above command.

Error: release aws-node-termination-handler failed: "aws-node-termination-handler" already exists

To uninstall the aws-node-termination-handler chart installation from the kube-system namespace run the following command.

helm delete --namespace kube-system aws-node-termination-handler

Configuration

The following tables lists the configurable parameters of the chart and their default values. These values are split up into the common configuration shared by all AWS Node Termination Handler modes, queue configuration used when AWS Node Termination Handler is in in queue-processor mode, and IMDS configuration used when AWS Node Termination Handler is in IMDS mode; for more information about the different modes see the project README.

Common Configuration

The configuration in this table applies to all AWS Node Termination Handler modes.

Parameter	Description	Default
`image.repository`	Image repository.	`public.ecr.aws/aws-ec2/aws-node-termination-handler`
`image.tag`	Image tag.	`v{{ .Chart.AppVersion}}`
`image.pullPolicy`	Image pull policy.	`IfNotPresent`
`image.pullSecrets`	Image pull secrets.	`[]`
`nameOverride`	Override the `name` of the chart.	`""`
`fullnameOverride`	Override the `fullname` of the chart.	`""`
`serviceAccount.create`	If `true`, create a new service account.	`true`
`serviceAccount.name`	Service account to be used. If not set and `serviceAccount.create` is `true`, a name is generated using the full name template.	`nil`
`serviceAccount.annotations`	Annotations to add to the service account.	`{}`
`rbac.create`	If `true`, create the RBAC resources.	`true`
`rbac.pspEnabled`	If `true`, create a pod security policy resource.	`true`
`customLabels`	Labels to add to all resource metadata.	`{}`
`podLabels`	Labels to add to the pod.	`{}`
`podAnnotations`	Annotations to add to the pod.	`{}`
`podSecurityContext`	Security context for the pod.	See values.yaml
`securityContext`	Security context for the aws-node-termination-handler container.	See values.yaml
`terminationGracePeriodSeconds`	The termination grace period for the pod.	`nil`
`resources`	Resource requests and limits for the aws-node-termination-handler container.	`{}`
`nodeSelector`	Expressions to select a node by it's labels for pod assignment. In IMDS mode this has a higher priority than `daemonsetNodeSelector` (for backwards compatibility) but shouldn't be used.	`{}`
`affinity`	Affinity settings for pod assignment. In IMDS mode this has a higher priority than `daemonsetAffinity` (for backwards compatibility) but shouldn't be used.	`{}`
`tolerations`	Tolerations for pod assignment. In IMDS mode this has a higher priority than `daemonsetTolerations` (for backwards compatibility) but shouldn't be used.	`[]`
`extraEnv`	Additional environment variables for the aws-node-termination-handler container.	`[]`
`probes`	The Kubernetes liveness probe configuration.	See values.yaml
`logLevel`	Sets the log level (`info`,`debug`, or `error`)	`info`
`logFormatVersion`	Sets the log format version. Available versions: 1, 2. Version 1 refers to the format that has been used through v1.17.3. Version 2 offers more detail for the "event kind" and "reason", especially when operating in Queue Processor mode.	`1`
`jsonLogging`	If `true`, use JSON-formatted logs instead of human readable logs.	`false`
`enablePrometheusServer`	If `true`, start an http server exposing `/metrics` endpoint for Prometheus.	`false`
`prometheusServerPort`	Replaces the default HTTP port for exposing Prometheus metrics.	`9092`
`dryRun`	If `true`, only log if a node would be drained.	`false`
`cordonOnly`	If `true`, nodes will be cordoned but not drained when an interruption event occurs.	`false`
`taintNode`	If `true`, nodes will be tainted when an interruption event occurs. Currently used taint keys are `aws-node-termination-handler/scheduled-maintenance`, `aws-node-termination-handler/spot-itn`, `aws-node-termination-handler/asg-lifecycle-termination` and `aws-node-termination-handler/rebalance-recommendation`.	`false`
`excludeFromLoadBalancers`	If `true`, nodes will be marked for exclusion from load balancers before they are cordoned. This applies the `node.kubernetes.io/exclude-from-external-load-balancers` label to enable the ServiceNodeExclusion feature gate. The label will not be modified or removed for nodes that already have it.	`false`
`deleteLocalData`	If `true`, continue even if there are pods using local data that will be deleted when the node is drained.	`true`
`ignoreDaemonSets`	If `true`, skip terminating daemon set managed pods.	`true`
`podTerminationGracePeriod`	The time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used, which defaults to 30 seconds if not specified for the pod.	`-1`
`nodeTerminationGracePeriod`	Period of time in seconds given to each node to terminate gracefully. Node draining will be scheduled based on this value to optimize the amount of compute time, but still safely drain the node before an event.	`120`
`emitKubernetesEvents`	If `true`, Kubernetes events will be emitted when interruption events are received and when actions are taken on Kubernetes nodes. In IMDS Processor mode a default set of annotations with all the node metadata gathered from IMDS will be attached to each event. More information here.	`false`
`completeLifecycleActionDelaySeconds`	Pause after draining the node before completing the EC2 Autoscaling lifecycle action. This may be helpful if Pods on the node have Persistent Volume Claims.	-1
`kubernetesEventsExtraAnnotations`	A comma-separated list of `key=value` extra annotations to attach to all emitted Kubernetes events (e.g. `first=annotation,sample.annotation/number=two"`).	`""`
`webhookURL`	Posts event data to URL upon instance interruption action.	`""`
`webhookURLSecretName`	Pass the webhook URL as a Secret using the key `webhookurl`.	`""`
`webhookHeaders`	Replace the default webhook headers (e.g. `{"Content-type":"application/json"}`).	`""`
`webhookProxy`	Uses the specified HTTP(S) proxy for sending webhook data.	`""`
`webhookTemplate`	Replaces the default webhook message template (e.g. `{"text":"[NTH][Instance Interruption] EventID: {{ .EventID }} - Kind: {{ .Kind }} - Instance: {{ .InstanceID }} - Node: {{ .NodeName }} - Description: {{ .Description }} - Start Time: {{ .StartTime }}"}`).	`""`
`webhookTemplateConfigMapName`	Pass the webhook template file as a configmap.	"``"
`webhookTemplateConfigMapKey`	Name of the Configmap key storing the template file.	`""`
`enableSqsTerminationDraining`	If `true`, this turns on queue-processor mode which drains nodes when an SQS termination event is received.	`false`

Queue-Processor Mode Configuration

The configuration in this table applies to AWS Node Termination Handler in queue-processor mode.

Parameter	Description	Default
`replicas`	The number of replicas in the deployment when using queue-processor mode (NOTE: increasing replicas may cause duplicate webhooks since pods are stateless).	`1`
`strategy`	Specify the update strategy for the deployment.	`{}`
`podDisruptionBudget`	Limit the disruption for controller pods, requires at least 2 controller replicas.	`{}`
`serviceMonitor.create`	If `true`, create a ServiceMonitor. This requires `enablePrometheusServer: true`.	`false`
`serviceMonitor.namespace`	Override ServiceMonitor Helm release namespace.	`nil`
`serviceMonitor.labels`	Additional ServiceMonitor metadata labels.	`{}`
`serviceMonitor.interval`	Prometheus scrape interval.	`30s`
`serviceMonitor.sampleLimit`	Number of scraped samples accepted.	`5000`
`priorityClassName`	Name of the PriorityClass to use for the Deployment.	`system-cluster-critical`
`awsRegion`	If specified, use the AWS region for AWS API calls, else NTH will try to find the region through the `AWS_REGION` environment variable, IMDS, or the specified queue URL.	`""`
`queueURL`	Listens for messages on the specified SQS queue URL.	`""`
`workers`	The maximum amount of parallel event processors to handle concurrent events.	`10`
`checkTagBeforeDraining`	If `true`, check that the instance is tagged with the `managedTag` before draining the node.	`true`
`managedTag`	The node tag to check if `checkTagBeforeDraining` is `true`.	`aws-node-termination-handler/managed`
`checkASGTagBeforeDraining`	[DEPRECATED](Use `checkTagBeforeDraining` instead) If `true`, check that the instance is tagged with the `managedAsgTag` before draining the node. If `false`, disables calls ASG API.	`true`
`managedAsgTag`	[DEPRECATED](Use `managedTag` instead) The node tag to check if `checkASGTagBeforeDraining` is `true`.
`useProviderId`	If `true`, fetch node name through Kubernetes node spec ProviderID instead of AWS event PrivateDnsHostname.	`false`

IMDS Mode Configuration

The configuration in this table applies to AWS Node Termination Handler in IMDS mode.

Parameter	Description	Default
`targetNodeOs`	Space separated list of node OS's to target (e.g. `"linux"`, `"windows"`, `"linux windows"`). Windows support is EXPERIMENTAL.	`"linux"`
`linuxPodLabels`	Labels to add to each Linux pod.	`{}`
`windowsPodLabels`	Labels to add to each Windows pod.	`{}`
`linuxPodAnnotations`	Annotations to add to each Linux pod.	`{}`
`windowsPodAnnotations`	Annotations to add to each Windows pod.	`{}`
`updateStrategy`	Update strategy for the all DaemonSets.	See values.yaml
`daemonsetPriorityClassName`	Name of the PriorityClass to use for all DaemonSets.	`system-node-critical`
`podMonitor.create`	If `true`, create a PodMonitor. This requires `enablePrometheusServer: true`.	`false`
`podMonitor.namespace`	Override PodMonitor Helm release namespace.	`nil`
`podMonitor.labels`	Additional PodMonitor metadata labels	`{}`
`podMonitor.interval`	Prometheus scrape interval.	`30s`
`podMonitor.sampleLimit`	Number of scraped samples accepted.	`5000`
`useHostNetwork`	If `true`, enables `hostNetwork` for the Linux DaemonSet. NOTE: setting this to `false` may cause issues accessing IMDSv2 if your account is not configured with an IP hop count of 2 see Metrics Endpoint Considerations	`true`
`dnsPolicy`	If specified, this overrides `linuxDnsPolicy` and `windowsDnsPolicy` with a single policy.	`""`
`dnsConfig`	If specified, this sets the dnsConfig: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config	`{}`
`linuxDnsPolicy`	DNS policy for the Linux DaemonSet.	`""`
`windowsDnsPolicy`	DNS policy for the Windows DaemonSet.	`""`
`daemonsetNodeSelector`	Expressions to select a node by it's labels for DaemonSet pod assignment. For backwards compatibility the `nodeSelector` value has priority over this but shouldn't be used.	`{}`
`linuxNodeSelector`	Override `daemonsetNodeSelector` for the Linux DaemonSet.	`{}`
`windowsNodeSelector`	Override `daemonsetNodeSelector` for the Windows DaemonSet.	`{}`
`daemonsetAffinity`	Affinity settings for DaemonSet pod assignment. For backwards compatibility the `affinity` has priority over this but shouldn't be used.	`{}`
`linuxAffinity`	Override `daemonsetAffinity` for the Linux DaemonSet.	`{}`
`windowsAffinity`	Override `daemonsetAffinity` for the Windows DaemonSet.	`{}`
`daemonsetTolerations`	Tolerations for DaemonSet pod assignment. For backwards compatibility the `tolerations` has priority over this but shouldn't be used.	`[]`
`linuxTolerations`	Override `daemonsetTolerations` for the Linux DaemonSet.	`[]`
`windowsTolerations`	Override `daemonsetTolerations` for the Linux DaemonSet.	`[]`
`enableProbesServer`	If `true`, start an http server exposing `/healthz` endpoint for probes.	`false`
`metadataTries`	The number of times to try requesting metadata.	`3`
`enableSpotInterruptionDraining`	If `true`, drain nodes when the spot interruption termination notice is received.	`true`
`enableScheduledEventDraining`	If `true`, drain nodes before the maintenance window starts for an EC2 instance scheduled event. This is EXPERIMENTAL.	`false`
`enableRebalanceMonitoring`	If `true`, cordon nodes when the rebalance recommendation notice is received. If you'd like to drain the node in addition to cordoning, then also set `enableRebalanceDraining`.	`false`
`enableRebalanceDraining`	If `true`, drain nodes when the rebalance recommendation notice is received.	`false`

Testing Configuration

The configuration in this table applies to AWS Node Termination Handler testing and is NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS.

Parameter	Description	Default
`awsEndpoint`	(Used for testing) If specified, use the provided AWS endpoint to make API calls.	`""`
`awsSecretAccessKey`	(Used for testing) Pass-thru environment variable.	`nil`
`awsAccessKeyID`	(Used for testing) Pass-thru environment variable.	`nil`
`instanceMetadataURL`	(Used for testing) If specified, use the provided metadata URL.	`""`
`procUptimeFile`	(Used for Testing) Specify the uptime file.	`/proc/uptime`

Metrics Endpoint Considerations

AWS Node Termination HAndler in IMDS mode runs as a DaemonSet with useHostNetwork: true by default. If the Prometheus server is enabled with enablePrometheusServer: true nothing else will be able to bind to the configured port (by default prometheusServerPort: 9092) in the root network namespace. Therefore, it will need to have a firewall/security group configured on the nodes to block access to the /metrics endpoint.

You can switch NTH in IMDS mode to run w/ useHostNetwork: false, but you will need to make sure that IMDSv1 is enabled or IMDSv2 IP hop count will need to be incremented to 2 (see the IMDSv2 documentation.

42 KiB Raw Blame History