Chapter 16: Managed Kubernetes: EKS, GKE, and AKS
Running your own control plane is an excellent way to learn Kubernetes. For most teams, managed services reduce operational overhead significantly. The control plane — etcd, the API server, the controller manager, the scheduler — requires careful backup, monitoring, upgrade orchestration, and high-availability configuration. Managed Kubernetes services take this burden off your team so you can focus on what runs on the cluster rather than what runs the cluster.
But “managed” does not mean “fully operated.” Every cloud provider draws the line differently between what they manage and what remains your responsibility. Understanding exactly where that line falls is essential for making an informed choice.
The Shared Responsibility Model
MANAGED KUBERNETES: WHO MANAGES WHAT?
──────────────────────────────────────
Cloud Provider Manages │ You Manage
───────────────────── │ ──────────
│
┌────────────────────────────────┐ │ ┌────────────────────────────────┐
│ Control Plane │ │ │ Worker Nodes │
│ ┌───────────┐ ┌────────────┐ │ │ │ ┌───────────┐ ┌───────────┐ │
│ │ API Server│ │ Controller │ │ │ │ │ kubelet │ │ Your Pods │ │
│ │ (HA, TLS) │ │ Manager │ │ │ │ │ │ │ │ │
│ └───────────┘ └────────────┘ │ │ │ └───────────┘ └───────────┘ │
│ ┌───────────┐ ┌────────────┐ │ │ │ ┌───────────┐ ┌───────────┐ │
│ │ Scheduler │ │ etcd │ │ │ │ │ kube-proxy│ │ CNI agent │ │
│ │ │ │ (backups) │ │ │ │ │ │ │ │ │
│ └───────────┘ └────────────┘ │ │ │ └───────────┘ └───────────┘ │
│ │ │ │ │
│ Upgrades, patches, HA, │ │ │ OS patching, scaling, │
│ etcd backups, API cert │ │ │ node upgrades, app deploys, │
│ rotation │ │ │ networking config, storage, │
└────────────────────────────────┘ │ │ security policies, RBAC │
│ └────────────────────────────────┘
│
* GKE Autopilot: Google also manages │ * With node auto-upgrade enabled,
the worker nodes and their sizing │ the provider patches node OS
What remains your responsibility in all cases: your application workloads, your RBAC policies, your network policies, your storage configuration, your monitoring, your cost management.
GKE: Google Kubernetes Engine
GKE is the most mature managed Kubernetes service. GKE is typically the first to adopt new Kubernetes features and the most opinionated about best practices.
Networking. GKE uses a VPC-native networking model with Alias IPs. Each node is allocated a secondary IP range from the VPC. Pods receive IPs from this secondary range. These are real VPC IPs — they are routable within the VPC without overlay networks or encapsulation. This means VPC firewall rules, routes, and VPC peering work natively with pod IPs.
Autopilot mode. GKE offers two modes: Standard (you manage node pools) and Autopilot (Google manages everything, including node provisioning and sizing). In Autopilot mode, you submit workloads and Google provisions the right amount of compute. You pay per pod resource request, not per node. Autopilot enforces security best practices by default: workloads run as non-root, privilege escalation is blocked, and host path mounts are disallowed.
Upgrades. GKE is typically the fastest to support new Kubernetes versions. It offers release channels (Rapid, Regular, Stable) that automatically upgrade the control plane and node pools on a schedule. Surge upgrades create extra nodes to maintain capacity during rolling node upgrades.
Pricing. $0.10/hr for the cluster management fee (Standard mode). Autopilot charges per pod resource request instead.
GKE Strengths
- Fastest Kubernetes version adoption
- Autopilot removes node management entirely
- VPC-native networking eliminates overlay complexity
- Tight integration with Google Cloud networking (Cloud NAT, Cloud Armor, Internal Load Balancers)
- Binary Authorization for supply chain security
GKE Weaknesses
- Smaller ecosystem of third-party integrations compared to AWS
- Autopilot restrictions may be too opinionated for some workloads
- Vendor lock-in to GCP networking model
EKS: Amazon Elastic Kubernetes Service
EKS is the most widely used managed Kubernetes service, reflecting AWS’s dominant market position. It is also the most “assembly required” of the three — AWS provides the control plane and expects you to configure everything else.
Networking. EKS uses the AWS VPC CNI plugin, which assigns pods real VPC IP addresses from Elastic Network Interfaces (ENIs). Each EC2 instance has a limit on the number of ENIs it can attach and the number of secondary IPs per ENI. This means pod density is limited by instance type:
| Instance Type | Max ENIs | IPs per ENI | Max Pods |
|---|---|---|---|
| t3.nano | 2 | 2 | ~4 |
| t3.medium | 3 | 6 | ~17 |
| m5.large | 3 | 10 | ~29 |
| m5.xlarge | 4 | 15 | ~58 |
| m5.24xlarge | 15 | 50 | ~737 |
This is a critical capacity planning consideration. If you run many small pods, you may hit the pod limit before you exhaust CPU or memory. AWS offers prefix delegation to increase pod density by assigning /28 prefixes instead of individual IPs.
Node management. EKS offers three options: self-managed nodes (EC2 instances you configure), managed node groups (AWS manages the EC2 lifecycle), and Fargate (serverless pods, similar to GKE Autopilot but per-pod). Karpenter is AWS’s open-source node autoscaler, which provisions right-sized nodes based on pending pod requirements — it is faster and more flexible than the Cluster Autoscaler.
Upgrades. EKS upgrades are the most manual of the three providers. You upgrade the control plane first (one API call or console click), then upgrade each node group separately. There is no automatic release channel for control plane upgrades in the standard configuration — you must actively track Kubernetes versions and initiate upgrades. However, EKS Auto Mode (launched December 2024) manages node and upgrade operations automatically for clusters that opt in.
Pricing. $0.10/hr for the cluster ($72/month). EKS on Fargate adds a per-pod charge.
EKS Strengths
- Largest ecosystem — most third-party tools are tested on EKS first
- Deep AWS integration (IAM roles for service accounts, ALB Ingress Controller, EBS CSI driver)
- Karpenter for intelligent, fast node autoscaling
- Most flexibility in configuration
- AWS marketplace of EKS add-ons
EKS Weaknesses
- Most manual upgrade process
- VPC CNI pod density limits require careful instance type selection
- More “assembly required” than GKE or AKS
AKS: Azure Kubernetes Service
AKS differentiates primarily on pricing: the control plane is free in the Free tier. You pay only for the worker node VMs.
Networking. AKS offers two networking models. kubenet is a basic overlay network where pods get IPs from a virtual network that is not routable in the VPC (Azure calls it VNet). Azure CNI assigns pods real VNet IPs, similar to AWS VPC CNI and GKE Alias IPs. Azure CNI Overlay is a newer option that provides Azure CNI features without consuming VNet IPs for every pod.
Upgrades. AKS has a rapid security patching cadence. It supports automatic upgrades through channels (none, patch, stable, rapid, node-image). Node image upgrades can be applied independently from Kubernetes version upgrades.
Pricing. Free tier: $0 for the control plane. Standard tier: $0.10/hr (adds SLA and more features). Premium tier: $0.60/hr (adds long-term support versions).
AKS Strengths
- Free control plane in Free tier
- Rapid security patching
- Strong integration with Azure Active Directory for RBAC
- Azure Arc extends AKS management to on-premises and other clouds
- AKS Automatic mode (similar to GKE Autopilot)
AKS Weaknesses
- Azure networking can be complex (VNet peering, NSG interactions)
- Historically slower Kubernetes version adoption than GKE
- Smaller Kubernetes-specific community than AWS
Comparison Table
| Feature | GKE | EKS | AKS |
|---|---|---|---|
| Control plane cost | $0.10/hr | $0.10/hr | Free (Free tier) |
| Serverless pods | Autopilot | Fargate | Virtual Nodes |
| Pod networking | Alias IPs (VPC-native) | VPC CNI (ENI-based) | Azure CNI or kubenet |
| Pod IP routable in VPC? | Yes | Yes | Yes (Azure CNI) |
| Default node autoscaler | Cluster Autoscaler | Karpenter / CA | Cluster Autoscaler / KEDA |
| Upgrade automation | Release channels | Manual initiation; EKS Auto Mode (Dec 2024) manages upgrades automatically | Upgrade channels |
| Version adoption speed | Fastest | Moderate | Moderate |
| Identity integration | Google IAM + Workload Identity | IAM Roles for Service Accounts | Azure AD + Workload Identity |
| Service mesh | Anthos Service Mesh | Istio / Linkerd (App Mesh deprecated Sept 2024) | Istio add-on (Open Service Mesh archived Sept 2023) |
| GPU support | Yes (multi-GPU, TPU) | Yes (GPU, Inferentia, Trainium) | Yes (GPU) |
| Max nodes per cluster | 15,000 | 5,000 (soft limit) | 5,000 |
When to Choose Each
See also Appendix C: Decision Trees for a quick decision flowchart.
Choose GKE when:
- You want the most automated, opinionated experience
- You are already on Google Cloud or are starting fresh
- You want Autopilot to eliminate node management
- You need fast access to the latest Kubernetes features
- You are running ML/AI workloads with TPU requirements
Choose EKS when:
- You are already on AWS (most organizations are)
- You need maximum flexibility and control
- Your team has AWS expertise
- You need deep integration with the AWS ecosystem (Lambda, SQS, DynamoDB)
- You want Karpenter for intelligent autoscaling
Choose AKS when:
- You are already on Azure or have an Enterprise Agreement
- You want a free control plane for dev/test
- You use Azure Active Directory for identity management
- You need hybrid cloud with Azure Arc
- You want the cheapest entry point for learning
Choose self-managed (kubeadm) when:
- You are on-premises with no cloud option
- You have strict regulatory requirements about where the control plane runs
- You are learning Kubernetes internals
- You need control over every component’s configuration
The Hidden Costs
The control plane fee is the smallest part of the bill. The real costs are:
- Worker node compute: The VMs or instances running your pods (typically 80-90% of the bill)
- Load balancers: Each Service of type LoadBalancer creates a cloud load balancer ($15-25/month each)
- NAT gateways: Required for private clusters to reach the internet ($30-45/month + data processing fees)
- Persistent storage: EBS volumes, Persistent Disks, Managed Disks ($0.08-0.10/GB/month for SSD)
- Data transfer: Cross-AZ traffic is charged on all three clouds ($0.01-0.02/GB)
- Monitoring and logging: CloudWatch, Cloud Monitoring, Azure Monitor charges for ingestion and storage
A “free” AKS control plane cluster running three m5.large worker nodes with a load balancer, NAT gateway, and 100 GB of persistent storage will cost approximately $300-400/month before data transfer.
Common Mistakes and Misconceptions
- “Managed Kubernetes means fully managed.” You still manage worker nodes (unless using Autopilot/Fargate), networking, storage, RBAC, monitoring, and your applications. “Managed” refers primarily to the control plane.
- “EKS/GKE/AKS clusters are identical to vanilla Kubernetes.” Each provider adds proprietary networking (VPC CNI, Alias IPs), identity (IRSA, Workload Identity), and storage integrations that don’t exist in upstream K8s.
- “The control plane fee is my main Kubernetes cost.” The $72-74/month control plane fee is typically under 5% of the total bill. Worker node compute, load balancers, NAT gateways, and data transfer dominate costs.
Further Reading
- GKE documentation — Comprehensive guides for Standard and Autopilot modes
- EKS documentation — Setup guides, best practices, and blueprints
- AKS documentation — Getting started, networking, and security guides
- EKS Best Practices Guide — AWS’s official best practices for EKS
- Karpenter documentation — Intelligent node autoscaling for Kubernetes
- GKE Autopilot overview — Understanding the fully managed mode
- KubeCon talks on YouTube — CNCF conference presentations on real-world managed K8s usage
- CNCF Slack #eks, #gke, #aks channels — Community support for each provider