kubezero/README.md

108 lines
4.9 KiB
Markdown
Raw Permalink Normal View History

2020-07-09 15:33:00 +00:00
KubeZero - Zero Down Time Kubernetes platform
========================
2021-01-27 12:20:11 +00:00
KubeZero is a Kubernetes distribution providing an integrated container platform so you can focus on your applications.
2020-05-01 13:57:31 +00:00
2021-01-27 12:20:11 +00:00
# Design philosophy
2020-05-01 14:08:55 +00:00
2022-11-24 20:24:20 +00:00
- Focus on security and simplicity over feature creep
2023-09-05 11:01:00 +00:00
- No vendor lock in, most components are optional and could be easily changed as needed
2020-07-09 15:33:00 +00:00
- No premium services / subscriptions required
2022-11-24 20:24:20 +00:00
- Staying up to date and contributing back to upstream projects, like alpine-cloud-images and others
2023-09-05 11:01:00 +00:00
- Cloud provider agnostic, bare-metal/self-hosted
- Organic Open Source / open and permissive licenses over closed-source solutions
2021-10-21 15:10:56 +00:00
- Corgi approved :dog:
2021-01-26 13:47:33 +00:00
2020-05-01 14:08:55 +00:00
2021-10-21 15:13:13 +00:00
# Architecture
![aws_architecture](docs/aws_architecture.png)
2021-01-12 15:27:21 +00:00
# Version / Support Matrix
2024-03-27 11:34:19 +00:00
KubeZero releases track the same *minor* version of Kubernetes.
2023-05-25 21:32:13 +00:00
Any 1.26.X-Y release of Kubezero supports any Kubernetes cluster 1.26.X.
2021-01-12 15:27:21 +00:00
2021-10-21 13:24:52 +00:00
KubeZero is distributed as a collection of versioned Helm charts, allowing custom upgrade schedules and module versions as needed.
2022-07-31 23:45:27 +00:00
```mermaid
2022-08-01 04:50:34 +00:00
%%{init: {'theme':'dark'}}%%
2022-07-31 23:45:27 +00:00
gantt
title KubeZero Support Timeline
dateFormat YYYY-MM-DD
2023-09-05 11:01:00 +00:00
section 1.27
beta :127b, 2023-09-01, 2023-09-30
2024-03-27 11:34:19 +00:00
release :after 127b, 2024-04-30
section 1.28
beta :128b, 2024-03-01, 2024-04-30
2024-03-27 22:49:26 +00:00
release :after 128b, 2024-08-31
2024-03-27 11:34:19 +00:00
section 1.29
2024-03-27 22:51:24 +00:00
beta :129b, 2024-07-01, 2024-08-30
2024-03-27 11:34:19 +00:00
release :after 129b, 2024-11-30
2022-07-31 23:45:27 +00:00
```
2021-07-01 14:42:39 +00:00
[Upstream release policy](https://kubernetes.io/releases/)
2021-01-12 15:27:21 +00:00
2021-01-26 14:27:22 +00:00
# Components
2022-05-03 13:58:03 +00:00
## OS
2023-09-05 11:01:00 +00:00
- all compute nodes are running on Alpine V3.18
- 2 GB encrypted root file system
- no external dependencies at boot time, apart from container registries
2022-05-03 13:58:03 +00:00
- minimal attack surface
- extremely small memory footprint / overhead
2023-09-05 11:01:00 +00:00
- cri-o container runtime incl. AppArmor support
## GitOps
- cli / cmd line install
- optional full ArgoCD support and integration
2022-05-03 13:58:03 +00:00
2023-09-05 11:01:00 +00:00
## Featured workloads
- rootless CI/CD build platform to build containers as part of a CI pipeline, using podman / fuse device plugin support
2024-03-27 11:34:19 +00:00
- containerized AI models via integrated out of the box support for Nvidia GPU workers as well as AWS Neuron
2020-07-09 15:33:00 +00:00
2020-08-26 10:41:28 +00:00
## Control plane
2022-05-03 13:58:03 +00:00
- all Kubernetes components compiled against Alpine OS using `buildmode=pie`
2020-08-26 10:41:28 +00:00
- support for single node control plane for small clusters / test environments to reduce costs
- access to control plane from within the VPC only by default ( VPN access required for Admin tasks )
- controller nodes are used for various platform admin controllers / operators to reduce costs and noise on worker nodes
2021-01-26 13:47:33 +00:00
2021-10-21 13:24:52 +00:00
## AWS integrations
- IAM roles for service accounts allowing each pod to assume individual IAM roles
- access to meta-data services is blocked all workload containers on all nodes
2022-02-01 15:53:55 +00:00
- all IAM roles are maintained via CloudBender automation
- aws-node-termination handler integrated
- support for spot instances per worker group incl. early draining etc.
2022-05-03 13:58:03 +00:00
- support for [Inf1 instances](https://aws.amazon.com/ec2/instance-types/inf1/) part of [AWS Neuron](https://aws.amazon.com/machine-learning/neuron/).
2020-07-09 15:33:00 +00:00
2020-08-26 10:41:28 +00:00
## Network
2023-05-25 21:32:13 +00:00
- Cilium using Geneve encapsulation, incl. increased MTU allowing flexible / more containers per worker node compared to eg. AWS VPC CNI
2022-02-01 15:53:55 +00:00
- Multus support for multiple network interfaces per pod, eg. additional AWS CNI
2020-08-26 10:41:28 +00:00
- no restrictions on IP space / sizing from the underlying VPC architecture
2020-07-09 15:33:00 +00:00
## Storage
2020-08-26 10:41:28 +00:00
- flexible EBS support incl. zone awareness
2021-10-21 13:24:52 +00:00
- EFS support via automated EFS provisioning for worker groups via CloudBender automation
- local storage provider (OpenEBS LVM) for latency sensitive high performance workloads
2022-02-01 15:53:55 +00:00
- CSI Snapshot controller and Gemini snapshot groups and retention
2020-08-26 10:41:28 +00:00
## Ingress
2024-03-27 11:34:19 +00:00
- AWS Network Loadbalancer and Istio Ingress controllers
2022-05-03 13:58:03 +00:00
- no additional costs per exposed service
- real client source IP available to workloads via HTTP header and access logs
- ACME SSL Certificate handling via cert-manager incl. renewal etc.
2020-08-26 10:41:28 +00:00
- support for TCP services
2024-03-27 11:34:19 +00:00
- optional rate limiting support
2020-08-26 10:41:28 +00:00
- optional full service mesh
## Metrics
2023-05-25 21:32:13 +00:00
- Prometheus support for all components, incl. out of cluster EC2 instances (node_exporter)
2020-08-26 10:41:28 +00:00
- automated service discovery allowing instant access to common workload metrics
2021-10-21 15:13:13 +00:00
- pre-configured Grafana dashboards and alerts
- Alertmanager events via SNSAlertHub to Slack, Google, Matrix, etc.
2020-05-01 14:08:55 +00:00
2020-08-26 10:41:28 +00:00
## Logging
2021-10-21 13:24:52 +00:00
- all container logs are enhanced with Kubernetes and AWS metadata to provide context for each message
2021-01-26 13:47:33 +00:00
- flexible ElasticSearch setup, leveraging the ECK operator, for easy maintenance & minimal admin knowledge required, incl. automated backups to S3
- Kibana allowing easy search and dashboards for all logs, incl. pre configured index templates and index management
2021-10-21 13:24:52 +00:00
- [fluentd-concerter](https://git.zero-downtime.net/ZeroDownTime/container-park/src/branch/master/fluentd-concenter) service providing queuing during highload as well as additional parsing options
2024-03-27 11:34:19 +00:00
- lightweight fluent-bit agents on each node requiring minimal resources forwarding logs secure via TLS to fluentd-concenter