Chapter 1: The Road to Kubernetes

---
config:
    flowchart:
        nodeSpacing: 15
        rankSpacing: 30
---
flowchart LR
    subgraph industry ["Industry Timeline"]
        bm["Bare Metal<br>1990s"] --> vm["Virtualization<br>VMware, Xen, KVM<br>2000s"]
        vm --> cloud["AWS EC2<br>Cloud era<br>2006"]
        cloud --> cm["Chef / Puppet<br>2009"]
        cm --> docker["Docker<br>2013"]
        docker --> k8s["Kubernetes<br>2014"]
        k8s --> cncf["CNCF + Managed K8s<br>2015+"]
    end

    subgraph google ["Inside Google"]
        borg["Borg<br>2003"] --> cg["cgroups +<br>namespaces<br>2006–08"]
        cg --> omega["Omega<br>2011"]
        omega --> k8sg["Kubernetes<br>2014"]
    end

    k8sg -.->|"open-sourced<br>as"| k8s

The Bare Metal Era: One Application, One Server

In the earliest days of server computing, one application ran on one physical server — simple, isolated, but catastrophically wasteful. Most servers ran at 5-15% average CPU utilization. Organizations maintained vast fleets of underutilized machines, each dedicated to a single workload, each requiring its own power, cooling, network connectivity, and physical maintenance.

The fundamental problem was resource fragmentation. You could not easily share a physical machine between two applications because there was no reliable mechanism to prevent one application from consuming all available CPU, memory, or disk I/O and starving the other. Operating system process isolation was insufficient: processes could interfere with each other through shared filesystems, port conflicts, library version conflicts, and resource exhaustion. The result was an era of enormous waste, where the primary cost driver was not compute but rather the operational overhead of managing vast numbers of barely-utilized machines.

The Virtualization Revolution: Abstracting Hardware

Virtualization, pioneered commercially by VMware in the late 1990s and later commoditized by Xen, KVM, and cloud providers like Amazon Web Services, represented the first fundamental shift. By inserting a hypervisor between the hardware and the operating system, virtualization allowed multiple isolated virtual machines to share a single physical host. Each VM got its own kernel, its own filesystem, its own network stack — complete isolation without dedicated hardware.

This solved the resource fragmentation problem at a macro level. You could now pack multiple workloads onto a single physical machine with strong isolation guarantees. Cloud computing emerged from this capability: Amazon Web Services launched EC2 in 2006, offering on-demand virtual machines that could be provisioned in minutes rather than the weeks required to procure and rack physical servers.

But virtualization introduced its own problems. Virtual machines were heavy: each carried a full operating system kernel, consuming hundreds of megabytes of RAM just for the OS overhead. Boot times were measured in minutes. VM images were large and slow to transfer. The hypervisor itself consumed resources. And while VMs solved the isolation problem, they did not solve the management problem. With hundreds or thousands of VMs, organizations still needed to answer fundamental questions: which workload runs where? How do you update an application across fifty VMs without downtime? How do you recover when a VM’s host machine fails? How do you ensure that a critical application always has enough resources?

The Configuration Management Interlude: Puppet, Chef, Ansible

The late 2000s and early 2010s saw the rise of configuration management tools — Puppet (2005), Chef (2009), Ansible (2012), SaltStack (2011). These tools addressed the management problem by allowing operators to describe the desired state of a server (which packages should be installed, which services should be running, which configuration files should be present) and then converge the actual state toward that desired state.

This was a crucial intellectual contribution that directly influenced Kubernetes: the desired state model. Instead of writing imperative scripts that said “install package X, then start service Y, then modify file Z,” configuration management tools let you declare “package X should be present, service Y should be running, file Z should contain these contents” and let the tool figure out how to get there. This declarative approach was more robust because it was idempotent — you could run the tool multiple times and get the same result, regardless of the starting state.

But configuration management operated at the wrong level of abstraction for the emerging world of containerized microservices. They could ensure that a particular server had the right software installed, but they could not easily reason about a distributed application that spanned dozens of servers, needed to be updated without downtime, and had to automatically recover from server failures. The unit of management was the machine, not the application.

The Container Revolution: Docker and the Shipping Container Metaphor

Containers were not new when Docker launched in 2013. The underlying Linux kernel features — cgroups (for resource limits) and namespaces (for isolation) — had been in the Linux kernel since 2008 (cgroups merged in v2.6.24). Google had been using containers internally since at least 2004, running everything from web search to Gmail inside Linux containers managed by their Borg system. FreeBSD had jails since 2000. Solaris had zones since 2004.

What Docker did was make containers accessible. It provided a simple command-line interface, a standardized image format (the Dockerfile and layered filesystem), and a distribution mechanism (Docker Hub). For the first time, a developer could package an application and all its dependencies into a single artifact, push it to a registry, and run it identically on any Linux machine. The shipping container metaphor was apt: just as standardized shipping containers revolutionized global trade by providing a uniform interface between ships, trains, and trucks, Docker containers provided a uniform interface between development, testing, and production.

Containers had profound advantages over VMs for application deployment:

Lightweight: Containers shared the host kernel, eliminating the OS overhead of VMs. A container image might be tens of megabytes instead of gigabytes.
Fast startup: Containers started in milliseconds to seconds, not minutes.
Density: You could run dozens or hundreds of containers on a single host, compared to perhaps a dozen VMs.
Reproducibility: The container image was immutable. The same image ran identically everywhere.
Composability: Complex applications could be decomposed into multiple containers, each with a single responsibility.

But Docker, by itself, solved only the packaging and isolation problem. It told you nothing about how to run containers at scale across a fleet of machines. If you had one hundred machines and one thousand containers, Docker could not tell you which container should run on which machine, what to do when a machine failed, how to route network traffic to the right container, or how to update a running application without downtime. This was the orchestration problem.

Google’s Borg: The Secret Precursor

To understand why Kubernetes looks the way it does, you must understand Google’s Borg system. Published in a landmark 2015 EuroSys paper (“Large-scale cluster management at Google with Borg” by Verma et al.), Borg had been running inside Google since at least 2003-2004. It managed virtually everything Google ran: web search, Gmail, YouTube, Maps, BigTable, MapReduce — hundreds of thousands of jobs across tens of thousands of machines in each of dozens of clusters.

Borg introduced several concepts that directly shaped Kubernetes:

1. The declarative job specification. In Borg, users did not tell the system to “start a process on machine X.” They declared a job specification: “I need 100 instances of this binary, each with 2 GB of RAM and 0.5 CPU cores, and they should be spread across failure domains.” Borg figured out where to place them, and if instances died, Borg automatically restarted them. This declarative model — describe what you want, not how to get it — became the philosophical foundation of Kubernetes.

2. Bin packing and resource management. Borg treated a cluster of machines as a single pool of resources. Its scheduler solved a variant of the bin packing problem: given a set of tasks with resource requirements and a set of machines with resource capacities, place tasks on machines to maximize utilization while respecting constraints (failure domain isolation, hardware requirements, etc.). Borg achieved remarkably high utilization — published figures suggest 60-70% average CPU utilization across Google’s fleet, compared to the 5-15% typical of enterprise data centers.

3. Service discovery via naming. Borg provided a built-in naming service (BNS) that allowed tasks to find each other by name rather than by IP address and port. This was essential in an environment where tasks were constantly being started, stopped, and moved between machines.

4. Allocs and resource reservations. Borg introduced the concept of “allocs” — reserved resources on a machine that could be filled with tasks. This concept directly inspired the Kubernetes Pod: a group of containers that share resources and are co-scheduled on the same machine.

Google’s Omega: The Research System

Borg was a production system, evolved over a decade, carrying enormous technical debt. In 2011-2013, Google built Omega as a research project to explore alternative cluster management architectures. Omega’s key contribution was its approach to scheduling: instead of Borg’s monolithic scheduler, Omega used optimistic concurrency control with a shared state model. Multiple schedulers could operate in parallel, each reading the full cluster state, making scheduling decisions, and then atomically committing those decisions. If two schedulers made conflicting decisions, one would detect the conflict and retry.

This shared-state, optimistic-concurrency approach influenced Kubernetes’ design in a critical way: it demonstrated that you could have multiple independent controllers operating on shared state, each making progress independently, with conflicts resolved through mechanisms like resource versioning. This is exactly the model that Kubernetes uses for its controllers.

The Birth of Kubernetes: 2014

Kubernetes was born in mid-2014 at Google, created by Joe Beda, Brendan Burns, and Craig McLuckie, with significant contributions from Brian Grant, Tim Hockin, and many others. It was explicitly designed to be an open-source, vendor-neutral system that embodied the lessons of Borg and Omega without carrying their technical debt.

The founders made a crucial strategic decision: Rather than open-sourcing Borg, they built a clean-room redesign designed to run anywhere. This meant:

Using standard open-source components (etcd for storage, instead of Google’s proprietary Chubby/Colossus)
Supporting multiple container runtimes (not just Google’s internal runtime)
Designing for extensibility from the start (a pluggable API that later evolved into ThirdPartyResources and then CRDs, custom controllers, pluggable networking)
Making the system portable across cloud providers and on-premises environments

Kubernetes was donated to the newly formed Cloud Native Computing Foundation (CNCF) in 2015, ensuring its governance was independent of any single company. This was a masterstroke of ecosystem building: by making Kubernetes vendor-neutral, Google ensured that every major cloud provider (AWS, Azure, GCP) would offer managed Kubernetes services, creating a de facto standard that benefited everyone — including Google, whose cloud platform was smaller than AWS but whose expertise in running Kubernetes was unmatched.

The Borg Lineage: Kubernetes (Greek: steersman or pilot) was reportedly codenamed “Project Seven” — a reference to Seven of Nine from Star Trek, a Borg who became an individual. The name is a deliberate allusion to Kubernetes’ origins in Google’s Borg system, while signaling that it had been liberated from Google’s proprietary infrastructure to become something independent.

Common Mistakes and Misconceptions

“Kubernetes is just Docker orchestration.” Kubernetes is a general-purpose container orchestrator. Docker is one of many supported container runtimes (and modern K8s clusters typically use containerd, not Docker). Kubernetes predates most Docker-native orchestration features and was designed to be runtime-agnostic from the start.
“Google open-sourced Borg.” Borg is still an internal Google system and has never been released. Kubernetes is a clean-room redesign inspired by Borg’s lessons and design principles, not a port or fork of Borg. The two systems share no code.
“Kubernetes was the first container orchestrator.” Apache Mesos with Marathon, Fleet, and Docker Swarm all predated or emerged alongside Kubernetes. K8s won the orchestration wars not by being first but through superior API design, extensibility, and community governance.

For a visual overview of how Part 1’s concepts connect, see Appendix B: Mental Models.

Kubernetes from First Principles