Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix E: Architecture Evolution Timeline

Kubernetes and its ecosystem have evolved rapidly since 2014. This timeline shows the major architectural shifts — each one driven by real problems with the previous approach. Understanding this evolution explains why the current ecosystem looks the way it does.


Visual Timeline (2013-2026)

Container Runtimes, Orchestration, Networking, and Package Management

flowchart TD
    subgraph y2013 ["2013"]
        docker13["Docker released<br>(monolithic daemon)"]
    end

    subgraph y2015 ["2015"]
        oci["OCI founded<br>runc extracted"]
        k8s10["Kubernetes 1.0<br>CNCF founded"]
        flannel["Flannel (overlay)<br>kube-proxy + iptables"]
        yaml15["Raw YAML<br>kubectl apply"]
    end

    subgraph y2016 ["2016"]
        containerd16["containerd extracted<br>from Docker"]
        calico["Calico (BGP)<br>Canal"]
        helm2["Helm v2<br>(with Tiller)"]
    end

    subgraph y2017 ["2017"]
        cri17["CRI interface defined"]
        swarm17["Docker Swarm<br>embedded in Docker"]
        cni17["CNI spec matures"]
    end

    subgraph y2018 ["2018-2019"]
        kust["Kustomize<br>(patch-based)"]
        cilium["Cilium (eBPF-based)"]
        helm3["Helm v3 (no Tiller!)"]
        mesos["Docker Enterprise<br>sold to Mirantis"]
    end

    subgraph y2020 ["2020-2022"]
        deprec["K8s 1.20: dockershim<br>DEPRECATED"]
        removed["K8s 1.24: dockershim<br>REMOVED"]
        mesos21["Apache Mesos<br>RETIRED"]
        helmkust["Helm + Kustomize<br>combined pattern"]
    end

    subgraph y2023 ["2023-2024"]
        std24["containerd + CRI-O<br>are the standards"]
        k8sstd["Kubernetes is<br>THE standard"]
        gw["Gateway API GA"]
        ciliumdef["Cilium = default CNI<br>for many platforms"]
        cdk["cdk8s, Timoni<br>(CUE-based)"]
    end

    docker13 --> oci --> containerd16 --> cri17 --> deprec --> removed --> std24
    docker13 --> k8s10 --> swarm17 --> mesos --> mesos21 --> k8sstd
    flannel --> calico --> cni17 --> cilium --> gw --> ciliumdef
    yaml15 --> helm2 --> kust --> helm3 --> helmkust --> cdk

Four parallel evolutions that shaped the infrastructure layer: Docker’s monolith was decomposed into containerd and CRI-O. The orchestration wars ended with Kubernetes as the universal standard. Networking shifted from overlays and iptables to eBPF-native with Cilium. And YAML management evolved from raw manifests through Helm’s Tiller era to today’s Helm v3 + Kustomize hybrid.

Security, GitOps, Scaling, and GPU/ML

flowchart TD
    subgraph y2017b ["2016-2017"]
        rbac["RBAC GA (K8s 1.8)"]
        ca["Cluster Autoscaler"]
        hpa["HPA v2"]
    end

    subgraph y2018b ["2018"]
        psp["PodSecurityPolicy"]
        argo["ArgoCD, Flux v1<br>GitOps begins"]
        devplugin["Device plugins<br>for GPUs"]
    end

    subgraph y2020b ["2020-2021"]
        opa["OPA / Gatekeeper"]
        sig["Sigstore, Cosign<br>Kyverno matures"]
        flux2["Flux v2 rewrite"]
        crossplane["Crossplane<br>Backstage joins CNCF"]
        karpenter["Karpenter (AWS)"]
        kubeflow["Kubeflow, KubeRay"]
    end

    subgraph y2022b ["2022-2023"]
        pspdep["PSP DEPRECATED"]
        pss["Pod Security Standards<br>replace PSP"]
        plateng["Platform Engineering<br>as a discipline"]
        gpuop["NVIDIA GPU<br>Operator mature"]
        dra["DRA alpha<br>(Dynamic Resource Allocation)"]
    end

    subgraph y2024b ["2024-2025"]
        supply["Supply chain security:<br>SBOM, SLSA standard"]
        idp["Internal Developer<br>Platforms go mainstream"]
        karpga["Karpenter GA<br>+ Azure support"]
        llm["LLM serving explosion:<br>vLLM, TGI, KServe"]
        llmd["llm-d, LeaderWorkerSet<br>multi-node inference"]
    end

    rbac --> psp --> opa --> pspdep --> pss --> supply
    argo --> flux2 --> crossplane --> plateng --> idp
    ca --> hpa --> karpenter --> karpga
    sig ~~~ pss
    devplugin --> kubeflow --> gpuop --> dra --> llm --> llmd

Security moved from the flawed PodSecurityPolicy to the simpler Pod Security Standards, while policy engines like OPA and Kyverno filled the gap. GitOps went from manual kubectl to ArgoCD/Flux, then broadened into full Internal Developer Platforms. Scaling evolved from the slow, group-based Cluster Autoscaler to Karpenter’s per-pod provisioning. And GPU/ML infrastructure exploded from basic device plugins to DRA, vLLM, and disaggregated serving with llm-d.

Observability

timeline
    title Observability Evolution
    2016 : Prometheus joins CNCF
    2018 : Prometheus graduates CNCF
    2019 : OpenTelemetry formed
         : (OpenTracing + OpenCensus merger)
    2021 : Grafana Loki, Tempo mature
    2023 : OpenTelemetry GA
         : (traces, metrics)
    2024 : OpenTelemetry logging matures

Observability converged from three fragmented signals — Prometheus for metrics, various tools for logs, and Jaeger/Zipkin for traces — into a unified standard with OpenTelemetry. The Grafana LGTM stack (Loki, Grafana, Tempo, Mimir) emerged as the dominant open-source backend.


Node Autoscaling: The CA-to-Karpenter Transition

Cluster Autoscaler (2016)Karpenter (2021+)
AbstractionNode-group basedGroupless provisioning
Scaling unitScale by group min/maxPer-pod scheduling
SpeedSlow (minutes)Fast (seconds)
Bin-packingNoCross-instance-type optimization
ConsolidationReactive onlyActive consolidation
Instance typesFixed per groupWorks across all types

Why it changed: Cluster Autoscaler couldn’t keep up with diverse GPU/ML workloads that needed fast, flexible provisioning across many instance types. Karpenter eliminated the node group abstraction entirely.


Summary: Architectural Shifts by Domain

DomainOld WayNew WayWhy It Changed
Container RuntimeDocker (monolithic daemon)containerd / CRI-O via CRIDocker included too much (build, swarm, CLI). K8s only needs a runtime. CRI allows pluggable runtimes.
OrchestrationDocker Swarm, Mesos, multiple optionsKubernetes (universal standard)K8s won on extensibility (CRDs, operators) and ecosystem. Swarm was too simple, Mesos too complex.
NetworkingFlannel overlay + iptables kube-proxyCilium (eBPF) + Gateway APIiptables doesn’t scale. Overlay adds latency. eBPF gives kernel-level networking without kube-proxy.
Package ManagementRaw YAML / Helm v2 with TillerHelm v3 + Kustomize (or combined)Tiller was a security risk (cluster-admin in-cluster). Raw YAML doesn’t compose. Kustomize avoids templating.
SecurityPodSecurityPolicy (PSP)Pod Security Standards (PSS) + Kyverno/OPAPSP was confusing, hard to audit, and couldn’t be extended. PSS is simpler; policy engines are more flexible.
GitOps & PlatformManual kubectl apply / CI pipelinesArgoCD/Flux + Internal Developer PlatformsImperative deploys are fragile and unauditable. GitOps makes the desired state declarative and versioned.
ScalingCluster Autoscaler (node-group based)Karpenter (groupless, per-pod)CA was slow and inflexible with diverse workloads. Karpenter provisions exactly what’s needed, fast.
GPU/MLBasic device pluginsGPU Operator + DRA + specialized serving (vLLM, llm-d)LLM explosion demands multi-node GPU scheduling, fractional GPUs, and inference-optimized runtimes.
ObservabilityPrometheus + ad-hoc logging/tracingOpenTelemetry (unified) + Grafana stackThree separate telemetry signals (metrics, logs, traces) needed a unified collection and correlation standard.

Back to Table of Contents