# AWS Node Termination Handler

AWS Node Termination Handler Helm chart for Kubernetes. For more information on this project see the project repo at [github.com/aws/aws-node-termination-handler](https://github.com/aws/aws-node-termination-handler).

## Prerequisites

- _Kubernetes_ >= v1.16

## Installing the Chart

Before you can install the chart you will need to add the `aws` repo to [Helm](https://helm.sh/).

```shell
helm repo add eks https://aws.github.io/eks-charts/
```

After you've installed the repo you can install the chart, the following command will install the chart with the release name `aws-node-termination-handler` and the default configuration to the `kube-system` namespace.

```shell
helm upgrade --install --namespace kube-system aws-node-termination-handler eks/aws-node-termination-handler
```

To install the chart on an EKS cluster where the AWS Node Termination Handler is already installed, you can run the following command.

```shell
helm upgrade --install --namespace kube-system aws-node-termination-handler eks/aws-node-termination-handler --recreate-pods --force
```

If you receive an error similar to the one below simply rerun the above command.

> Error: release aws-node-termination-handler failed: <resource> "aws-node-termination-handler" already exists

To uninstall the `aws-node-termination-handler` chart installation from the `kube-system` namespace run the following command.

```shell
helm delete --namespace kube-system aws-node-termination-handler
```

## Configuration

The following tables lists the configurable parameters of the chart and their default values. These values are split up into the [common configuration](#common-configuration) shared by all AWS Node Termination Handler modes, [queue configuration](#queue-processor-mode-configuration) used when AWS Node Termination Handler is in in queue-processor mode, and [IMDS configuration](#imds-mode-configuration) used when AWS Node Termination Handler is in IMDS mode; for more information about the different modes see the project [README](https://github.com/aws/aws-node-termination-handler/blob/main/README.md).

### Common Configuration

The configuration in this table applies to all AWS Node Termination Handler modes.

| Parameter                          | Description                                                                                                                                                                                                                                                                                                                                                                            | Default                                               |
| ---------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- |
| `image.repository`                 | Image repository.                                                                                                                                                                                                                                                                                                                                                                      | `public.ecr.aws/aws-ec2/aws-node-termination-handler` |
| `image.tag`                        | Image tag.                                                                                                                                                                                                                                                                                                                                                                             | `v{{ .Chart.AppVersion}}`                             |
| `image.pullPolicy`                 | Image pull policy.                                                                                                                                                                                                                                                                                                                                                                     | `IfNotPresent`                                        |
| `image.pullSecrets`                | Image pull secrets.                                                                                                                                                                                                                                                                                                                                                                    | `[]`                                                  |
| `nameOverride`                     | Override the `name` of the chart.                                                                                                                                                                                                                                                                                                                                                      | `""`                                                  |
| `fullnameOverride`                 | Override the `fullname` of the chart.                                                                                                                                                                                                                                                                                                                                                  | `""`                                                  |
| `serviceAccount.create`            | If `true`, create a new service account.                                                                                                                                                                                                                                                                                                                                               | `true`                                                |
| `serviceAccount.name`              | Service account to be used. If not set and `serviceAccount.create` is `true`, a name is generated using the full name template.                                                                                                                                                                                                                                                        | `nil`                                                 |
| `serviceAccount.annotations`       | Annotations to add to the service account.                                                                                                                                                                                                                                                                                                                                             | `{}`                                                  |
| `rbac.create`                      | If `true`, create the RBAC resources.                                                                                                                                                                                                                                                                                                                                                  | `true`                                                |
| `rbac.pspEnabled`                  | If `true`, create a pod security policy resource.                                                                                                                                                                                                                                                                                                                                      | `true`                                                |
| `customLabels`                     | Labels to add to all resource metadata.                                                                                                                                                                                                                                                                                                                                                | `{}`                                                  |
| `podLabels`                        | Labels to add to the pod.                                                                                                                                                                                                                                                                                                                                                              | `{}`                                                  |
| `podAnnotations`                   | Annotations to add to the pod.                                                                                                                                                                                                                                                                                                                                                         | `{}`                                                  |
| `podSecurityContext`               | Security context for the pod.                                                                                                                                                                                                                                                                                                                                                          | _See values.yaml_                                     |
| `securityContext`                  | Security context for the _aws-node-termination-handler_ container.                                                                                                                                                                                                                                                                                                                     | _See values.yaml_                                     |
| `terminationGracePeriodSeconds`    | The termination grace period for the pod.                                                                                                                                                                                                                                                                                                                                              | `nil`                                                 |
| `resources`                        | Resource requests and limits for the _aws-node-termination-handler_ container.                                                                                                                                                                                                                                                                                                         | `{}`                                                  |
| `nodeSelector`                     | Expressions to select a node by it's labels for pod assignment. In IMDS mode this has a higher priority than `daemonsetNodeSelector` (for backwards compatibility) but shouldn't be used.                                                                                                                                                                                              | `{}`                                                  |
| `affinity`                         | Affinity settings for pod assignment. In IMDS mode this has a higher priority than `daemonsetAffinity` (for backwards compatibility) but shouldn't be used.                                                                                                                                                                                                                            | `{}`                                                  |
| `tolerations`                      | Tolerations for pod assignment. In IMDS mode this has a higher priority than `daemonsetTolerations` (for backwards compatibility) but shouldn't be used.                                                                                                                                                                                                                               | `[]`                                                  |
| `extraEnv`                         | Additional environment variables for the _aws-node-termination-handler_ container.                                                                                                                                                                                                                                                                                                     | `[]`                                                  |
| `probes`                           | The Kubernetes liveness probe configuration.                                                                                                                                                                                                                                                                                                                                           | _See values.yaml_                                     |
| `logLevel`                         | Sets the log level (`info`,`debug`, or `error`)                                                                                                                                                                                                                                                                                                                                        | `info`                                                |
| `jsonLogging`                      | If `true`, use JSON-formatted logs instead of human readable logs.                                                                                                                                                                                                                                                                                                                     | `false`                                               |
| `enablePrometheusServer`           | If `true`, start an http server exposing `/metrics` endpoint for _Prometheus_.                                                                                                                                                                                                                                                                                                         | `false`                                               |
| `prometheusServerPort`             | Replaces the default HTTP port for exposing _Prometheus_ metrics.                                                                                                                                                                                                                                                                                                                      | `9092`                                                |
| `dryRun`                           | If `true`, only log if a node would be drained.                                                                                                                                                                                                                                                                                                                                        | `false`                                               |
| `cordonOnly`                       | If `true`, nodes will be cordoned but not drained when an interruption event occurs.                                                                                                                                                                                                                                                                                                   | `false`                                               |
| `taintNode`                        | If `true`, nodes will be tainted when an interruption event occurs. Currently used taint keys are `aws-node-termination-handler/scheduled-maintenance`, `aws-node-termination-handler/spot-itn`, `aws-node-termination-handler/asg-lifecycle-termination` and `aws-node-termination-handler/rebalance-recommendation`.                                                                 | `false`                                               |
| `excludeFromLoadBalancers`         | If `true`, nodes will be marked for exclusion from load balancers before they are cordoned. This applies the `node.kubernetes.io/exclude-from-external-load-balancers` label to enable the ServiceNodeExclusion feature gate. The label will not be modified or removed for nodes that already have it.                                                                                | `false`                                               |
| `deleteLocalData`                  | If `true`, continue even if there are pods using local data that will be deleted when the node is drained.                                                                                                                                                                                                                                                                             | `true`                                                |
| `ignoreDaemonSets`                 | If `true`, skip terminating daemon set managed pods.                                                                                                                                                                                                                                                                                                                                   | `true`                                                |
| `podTerminationGracePeriod`        | The time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used, which defaults to 30 seconds if not specified for the pod.                                                                                                                                                                                            | `-1`                                                  |
| `nodeTerminationGracePeriod`       | Period of time in seconds given to each node to terminate gracefully. Node draining will be scheduled based on this value to optimize the amount of compute time, but still safely drain the node before an event.                                                                                                                                                                     | `120`                                                 |
| `emitKubernetesEvents`             | If `true`, Kubernetes events will be emitted when interruption events are received and when actions are taken on Kubernetes nodes. In IMDS Processor mode a default set of annotations with all the node metadata gathered from IMDS will be attached to each event. More information [here](https://github.com/aws/aws-node-termination-handler/blob/main/docs/kubernetes_events.md). | `false`                                               |
| `completeLifecycleActionDelaySeconds` | Pause after draining the node before completing the EC2 Autoscaling lifecycle action. This may be helpful if Pods on the node have Persistent Volume Claims. | -1 |
| `kubernetesEventsExtraAnnotations` | A comma-separated list of `key=value` extra annotations to attach to all emitted Kubernetes events (e.g. `first=annotation,sample.annotation/number=two"`).                                                                                                                                                                                                                            | `""`                                                  |
| `webhookURL`                       | Posts event data to URL upon instance interruption action.                                                                                                                                                                                                                                                                                                                             | `""`                                                  |
| `webhookURLSecretName`             | Pass the webhook URL as a Secret using the key `webhookurl`.                                                                                                                                                                                                                                                                                                                           | `""`                                                  |
| `webhookHeaders`                   | Replace the default webhook headers (e.g. `{"Content-type":"application/json"}`).                                                                                                                                                                                                                                                                                                      | `""`                                                  |
| `webhookProxy`                     | Uses the specified HTTP(S) proxy for sending webhook data.                                                                                                                                                                                                                                                                                                                             | `""`                                                  |
| `webhookTemplate`                  | Replaces the default webhook message template (e.g. `{"text":"[NTH][Instance Interruption] EventID: {{ .EventID }} - Kind: {{ .Kind }} - Instance: {{ .InstanceID }} - Node: {{ .NodeName }} - Description: {{ .Description }} - Start Time: {{ .StartTime }}"}`).                                                                                                                     | `""`                                                  |
| `webhookTemplateConfigMapName`     | Pass the webhook template file as a configmap.                                                                                                                                                                                                                                                                                                                                         | "``"                                                  |
| `webhookTemplateConfigMapKey`      | Name of the Configmap key storing the template file.                                                                                                                                                                                                                                                                                                                                   | `""`                                                  |
| `enableSqsTerminationDraining`     | If `true`, this turns on queue-processor mode which drains nodes when an SQS termination event is received.                                                                                                                                                                                                                                                                            | `false`                                               |

### Queue-Processor Mode Configuration

The configuration in this table applies to AWS Node Termination Handler in queue-processor mode.

| Parameter                    | Description                                                                                                                                                               | Default                                |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------- |
| `replicas`                   | The number of replicas in the deployment when using queue-processor mode (NOTE: increasing replicas may cause duplicate webhooks since pods are stateless).               | `1`                                    |
| `strategy`                   | Specify the update strategy for the deployment.                                                                                                                           | `{}`                                   |
| `podDisruptionBudget`        | Limit the disruption for controller pods, requires at least 2 controller replicas.                                                                                        | `{}`                                   |
| `serviceMonitor.create`      | If `true`, create a ServiceMonitor. This requires `enablePrometheusServer: true`.                                                                                         | `false`                                |
| `serviceMonitor.namespace`   | Override ServiceMonitor _Helm_ release namespace.                                                                                                                         | `nil`                                  |
| `serviceMonitor.labels`      | Additional ServiceMonitor metadata labels.                                                                                                                                | `{}`                                   |
| `serviceMonitor.interval`    | _Prometheus_ scrape interval.                                                                                                                                             | `30s`                                  |
| `serviceMonitor.sampleLimit` | Number of scraped samples accepted.                                                                                                                                       | `5000`                                 |
| `priorityClassName`          | Name of the PriorityClass to use for the Deployment.                                                                                                                      | `system-cluster-critical`              |
| `awsRegion`                  | If specified, use the AWS region for AWS API calls, else NTH will try to find the region through the `AWS_REGION` environment variable, IMDS, or the specified queue URL. | `""`                                   |
| `queueURL`                   | Listens for messages on the specified SQS queue URL.                                                                                                                      | `""`                                   |
| `workers`                    | The maximum amount of parallel event processors to handle concurrent events.                                                                                              | `10`                                   |
| `checkTagBeforeDraining`     | If `true`, check that the instance is tagged with the `managedTag` before draining the node.                                                                              | `true`                                 |
| `managedTag`                 | The node tag to check if `checkTagBeforeDraining` is `true`.                                                                                                              | `aws-node-termination-handler/managed` |
| `checkASGTagBeforeDraining`  | [DEPRECATED](Use `checkTagBeforeDraining` instead) If `true`, check that the instance is tagged with the `managedAsgTag` before draining the node. If `false`, disables calls ASG API.                                                                          | `true`                                 |
| `managedAsgTag`              | [DEPRECATED](Use `managedTag` instead) The node tag to check if `checkASGTagBeforeDraining` is `true`.     
| `useProviderId`              | If `true`, fetch node name through Kubernetes node spec ProviderID instead of AWS event PrivateDnsHostname.                                                               | `false`                                |

### IMDS Mode Configuration

The configuration in this table applies to AWS Node Termination Handler in IMDS mode.

| Parameter                        | Description                                                                                                                                                                                                                                                   | Default                |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- |
| `targetNodeOs`                   | Space separated list of node OS's to target (e.g. `"linux"`, `"windows"`, `"linux windows"`). Windows support is **EXPERIMENTAL**.                                                                                                                            | `"linux"`              |
| `linuxPodLabels`                 | Labels to add to each Linux pod.                                                                                                                                                                                                                              | `{}`                   |
| `windowsPodLabels`               | Labels to add to each Windows pod.                                                                                                                                                                                                                            | `{}`                   |
| `linuxPodAnnotations`            | Annotations to add to each Linux pod.                                                                                                                                                                                                                         | `{}`                   |
| `windowsPodAnnotations`          | Annotations to add to each Windows pod.                                                                                                                                                                                                                       | `{}`                   |
| `updateStrategy`                 | Update strategy for the all DaemonSets.                                                                                                                                                                                                                       | _See values.yaml_      |
| `daemonsetPriorityClassName`     | Name of the PriorityClass to use for all DaemonSets.                                                                                                                                                                                                          | `system-node-critical` |
| `podMonitor.create`              | If `true`, create a PodMonitor. This requires `enablePrometheusServer: true`.                                                                                                                                                                                 | `false`                |
| `podMonitor.namespace`           | Override PodMonitor _Helm_ release namespace.                                                                                                                                                                                                                 | `nil`                  |
| `podMonitor.labels`              | Additional PodMonitor metadata labels                                                                                                                                                                                                                         | `{}`                   |
| `podMonitor.interval`            | _Prometheus_ scrape interval.                                                                                                                                                                                                                                 | `30s`                  |
| `podMonitor.sampleLimit`         | Number of scraped samples accepted.                                                                                                                                                                                                                           | `5000`                 |
| `useHostNetwork`                 | If `true`, enables `hostNetwork` for the Linux DaemonSet. NOTE: setting this to `false` may cause issues accessing IMDSv2 if your account is not configured with an IP hop count of 2 see [Metrics Endpoint Considerations](#metrics-endpoint-considerations) | `true`                 |
| `dnsPolicy`                      | If specified, this overrides `linuxDnsPolicy` and `windowsDnsPolicy` with a single policy.                                                                                                                                                                    | `""`                   |
| `dnsConfig`                      | If specified, this sets the dnsConfig: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config                                                                                                                                                                    | `{}`                   |
| `linuxDnsPolicy`                 | DNS policy for the Linux DaemonSet.                                                                                                                                                                                                                           | `""`                   |
| `windowsDnsPolicy`               | DNS policy for the Windows DaemonSet.                                                                                                                                                                                                                         | `""`                   |
| `daemonsetNodeSelector`          | Expressions to select a node by it's labels for DaemonSet pod assignment. For backwards compatibility the `nodeSelector` value has priority over this but shouldn't be used.                                                                                  | `{}`                   |
| `linuxNodeSelector`              | Override `daemonsetNodeSelector` for the Linux DaemonSet.                                                                                                                                                                                                     | `{}`                   |
| `windowsNodeSelector`            | Override `daemonsetNodeSelector` for the Windows DaemonSet.                                                                                                                                                                                                   | `{}`                   |
| `daemonsetAffinity`              | Affinity settings for DaemonSet pod assignment. For backwards compatibility the `affinity` has priority over this but shouldn't be used.                                                                                                                      | `{}`                   |
| `linuxAffinity`                  | Override `daemonsetAffinity` for the Linux DaemonSet.                                                                                                                                                                                                         | `{}`                   |
| `windowsAffinity`                | Override `daemonsetAffinity` for the Windows DaemonSet.                                                                                                                                                                                                       | `{}`                   |
| `daemonsetTolerations`           | Tolerations for DaemonSet pod assignment. For backwards compatibility the `tolerations` has priority over this but shouldn't be used.                                                                                                                         | `[]`                   |
| `linuxTolerations`               | Override `daemonsetTolerations` for the Linux DaemonSet.                                                                                                                                                                                                      | `[]`                   |
| `windowsTolerations`             | Override `daemonsetTolerations` for the Linux DaemonSet.                                                                                                                                                                                                      | `[]`                   |
| `enableProbesServer`             | If `true`, start an http server exposing `/healthz` endpoint for probes.                                                                                                                                                                                      | `false`                |
| `metadataTries`                  | The number of times to try requesting metadata.                                                                                                                                                                                                               | `3`                    |
| `enableSpotInterruptionDraining` | If `true`, drain nodes when the spot interruption termination notice is received.                                                                                                                                                                             | `true`                 |
| `enableScheduledEventDraining`   | If `true`, drain nodes before the maintenance window starts for an EC2 instance scheduled event. This is **EXPERIMENTAL**.                                                                                                                                    | `false`                |
| `enableRebalanceMonitoring`      | If `true`, cordon nodes when the rebalance recommendation notice is received. If you'd like to drain the node in addition to cordoning, then also set `enableRebalanceDraining`.                                                                              | `false`                |
| `enableRebalanceDraining`        | If `true`, drain nodes when the rebalance recommendation notice is received.                                                                                                                                                                                  | `false`                |

### Testing Configuration

The configuration in this table applies to AWS Node Termination Handler testing and is **NOT RECOMMENDED** FOR PRODUCTION DEPLOYMENTS.

| Parameter             | Description                                                                       | Default        |
| --------------------- | --------------------------------------------------------------------------------- | -------------- |
| `awsEndpoint`         | (Used for testing) If specified, use the provided AWS endpoint to make API calls. | `""`           |
| `awsSecretAccessKey`  | (Used for testing) Pass-thru environment variable.                                | `nil`          |
| `awsAccessKeyID`      | (Used for testing) Pass-thru environment variable.                                | `nil`          |
| `instanceMetadataURL` | (Used for testing) If specified, use the provided metadata URL.                   | `""`           |
| `procUptimeFile`      | (Used for Testing) Specify the uptime file.                                       | `/proc/uptime` |

## Metrics Endpoint Considerations

AWS Node Termination HAndler in IMDS mode runs as a DaemonSet with `useHostNetwork: true` by default. If the Prometheus server is enabled with `enablePrometheusServer: true` nothing else will be able to bind to the configured port (by default `prometheusServerPort: 9092`) in the root network namespace. Therefore, it will need to have a firewall/security group configured on the nodes to block access to the `/metrics` endpoint.

You can switch NTH in IMDS mode to run w/ `useHostNetwork: false`, but you will need to make sure that IMDSv1 is enabled or IMDSv2 IP hop count will need to be incremented to 2 (see the [IMDSv2 documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html).