Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 34: Multi-Cluster Strategies

A single Kubernetes cluster is a single failure domain. One misconfigured admission webhook can block all deployments. One etcd corruption event can lose all state. One cloud region outage can take everything offline. As organizations move from “we run some things on Kubernetes” to “Kubernetes is our platform,” the question shifts from “how do we run a cluster?” to “how do we run many clusters, and how do they relate to each other?”

Multi-cluster is not about redundancy alone. Teams adopt multiple clusters for blast radius reduction, regulatory compliance, geographic latency, team isolation, and environment separation. The challenge is not running multiple clusters — it is managing them as a coherent system without reintroducing the operational complexity Kubernetes was supposed to eliminate. For a visual overview of Part 7’s platform engineering concepts, see Appendix B: Mental Models.

Why Multi-Cluster

  • Blast radius — Multiple clusters contain failures so a bad upgrade in staging does not take down production.
  • Compliance and data sovereignty — Regulations like GDPR and HIPAA may require per-region clusters to keep data local.
  • Latency — Geographic distribution puts compute close to users.
  • Team isolation — Separate clusters provide hard isolation (API servers, RBAC, upgrade schedules) beyond what namespaces offer.
  • Upgrade cadence — Running version N in production and N+1 in staging lets teams validate upgrades before rollout.

Approach 1: Independent Clusters

The simplest multi-cluster strategy is no strategy at all. Each cluster is independently provisioned, independently configured, and independently managed. Teams own their clusters end-to-end.

This works for small organizations with 2–3 clusters and dedicated platform teams per cluster. It fails at scale because every cluster drifts: different versions, different policies, different monitoring configurations, different security postures.

Approach 2: GitOps-Driven Multi-Cluster

The most widely adopted approach uses a GitOps tool to manage multiple clusters from a single source of truth. ArgoCD ApplicationSets are purpose-built for this.

flowchart TB
    subgraph git["Git Repository"]
        base["/base/<br>deployment.yaml<br>networkpolicy.yaml<br>monitoring.yaml"]
        usVals["/clusters/us-east/<br>values.yaml"]
        euVals["/clusters/eu-west/<br>values.yaml"]
        apVals["/clusters/ap-south/<br>values.yaml"]
    end

    subgraph hub["ArgoCD Hub Cluster"]
        appset["ApplicationSet generator<br>For each cluster:<br>- Create Application<br>- Inject cluster-specific values<br>- Sync state to match Git"]
    end

    subgraph regional["Regional Clusters"]
        usEast["us-east cluster<br>base + region overrides"]
        euWest["eu-west cluster<br>base + region overrides"]
        apSouth["ap-south cluster<br>base + region overrides"]
    end

    git -- "ArgoCD watches repo" --> hub
    appset --> usEast
    appset --> euWest
    appset --> apSouth

    style git fill:#f0f0ff,stroke:#333
    style hub fill:#fff0e0,stroke:#333
    style regional fill:#e0ffe0,stroke:#333
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: platform-services
  namespace: argocd
spec:
  generators:
    - clusters:
        selector:
          matchLabels:
            env: production
  template:
    metadata:
      name: "platform-{{name}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/org/platform-config
        targetRevision: main
        path: "clusters/{{metadata.labels.region}}"
      destination:
        server: "{{server}}"
        namespace: platform
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

The ApplicationSet generator iterates over all clusters registered in ArgoCD that match the label selector, creates one Application per cluster, and injects cluster-specific values. A single Git commit can roll out a change to every production cluster worldwide.

This approach does not, however, provide cross-cluster service discovery or traffic management.

Approach 3: Federation

Federation projects attempt to provide a single API that spans multiple clusters. You submit a workload to the federation control plane, and it distributes replicas across member clusters.

KubeFed (Kubernetes Federation v2) was the original approach but is no longer actively developed. Karmada is the current leading project in this space. Karmada provides:

  • A dedicated API server that accepts standard Kubernetes resources
  • PropagationPolicy resources that define which clusters receive which workloads
  • OverridePolicy resources for per-cluster customization
  • Replica scheduling across clusters (weighted, by resource availability, or by policy)
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: api-server-spread
spec:
  resourceSelectors:
    - apiVersion: apps/v1
      kind: Deployment
      name: api-server
  placement:
    clusterAffinity:
      clusterNames:
        - us-east
        - eu-west
        - ap-south
    replicaScheduling:
      replicaDivisionPreference: Weighted
      replicaSchedulingType: Divided
      weightPreference:
        staticWeightList:
          - targetCluster:
              clusterNames: [us-east]
            weight: 2
          - targetCluster:
              clusterNames: [eu-west]
            weight: 1
          - targetCluster:
              clusterNames: [ap-south]
            weight: 1

Open Cluster Management (OCM), a CNCF sandbox project backed by Red Hat, takes a different approach to federation. Rather than a centralized control plane that pushes workloads to clusters, OCM uses a hub-and-spoke model where managed clusters pull their desired state from the hub via agents. This pull-based model can be easier to operate in environments with strict network policies or firewalls between clusters.

Federation is powerful but complex. It introduces a new control plane that must itself be highly available, and debugging failures requires understanding the federation layer, the per-cluster state, and the reconciliation between them.

Approach 4: Service Mesh Multi-Cluster

Service meshes solve the cross-cluster networking problem: how do services in cluster A discover and call services in cluster B?

Istio multi-cluster supports multiple topologies: shared control plane, replicated control planes, and multi-primary. In the multi-primary model, each cluster runs its own Istio control plane, and they exchange service endpoint information so that a service in cluster A can route traffic to pods in cluster B as if they were local.

Cilium ClusterMesh provides a similar capability at the CNI level. Cilium agents across clusters connect via a shared etcd (or KVStoreMesh proxy) and exchange pod identity and endpoint information. Services can be declared as “global,” making them accessible from any cluster in the mesh.

# Cilium global service annotation
apiVersion: v1
kind: Service
metadata:
  name: api-server
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/shared: "true"
spec:
  ports:
    - port: 80

With this annotation, any pod in any cluster in the ClusterMesh can resolve api-server and reach backends in the originating cluster. Cilium handles endpoint synchronization, identity-aware routing, and even affinity (prefer local cluster backends).

Approach 5: Cluster API for Lifecycle Management

All the above approaches assume clusters already exist. Cluster API (CAPI) addresses the lifecycle problem: how do you create, upgrade, and delete clusters declaratively?

Cluster API treats clusters as Kubernetes resources. You define a Cluster, MachineDeployment, and infrastructure-specific resources (AWS, Azure, GCP, vSphere), and Cluster API controllers reconcile them into running clusters. Upgrading a cluster’s Kubernetes version is a spec change; Cluster API handles the rolling update of control plane and worker nodes.

Combining Cluster API with GitOps gives you a fully declarative multi-cluster lifecycle: Git commits create clusters, ArgoCD ApplicationSets configure them, and Cluster API manages their infrastructure.

Choosing an Approach

RequirementRecommended Approach
Consistent configuration across clustersGitOps (ArgoCD ApplicationSets)
Cross-cluster service discoveryService mesh (Istio, Cilium ClusterMesh)
Workload distribution across clustersFederation (Karmada)
Declarative cluster lifecycleCluster API
Simple, low-overheadIndependent clusters + GitOps

Most organizations start with GitOps-driven multi-cluster and add service mesh or federation only when they have a concrete cross-cluster routing or scheduling requirement. Cluster API is orthogonal — it manages infrastructure regardless of the workload management strategy.

Common Mistakes and Misconceptions

  • “One big cluster is always better than multiple small ones.” Large clusters have larger blast radius, harder upgrades, and more complex RBAC. Many organizations use multiple clusters for environment isolation, team autonomy, and regional locality.
  • “Service mesh is required for cross-cluster communication.” DNS-based service discovery, cloud load balancers, or simple ingress routing can connect services across clusters. A mesh adds mTLS and observability but isn’t always necessary.

Further Reading


Next: Building Internal Developer Platforms — Backstage, golden paths, and the platform engineering stack.