Cluster Pattern Blast Radius Comparison

Interactive failure domain analysis for architecture decision support

RVTMA Reference Patterns

Pattern A · Single Cluster + MachineSet Segregation

Low-medium ops Largest blast radius Recommended default

Click any component to visualize its failure blast radius

OpenShift Cluster

Control Plane

API server · etcd · scheduler

cp1cp2cp3

Infra Pool

Ingress + monitoring

i1i2i3

Virt-Prod Pool

Production VMs

w1w2w3w4w5w6

Virt-NonProd Pool

Dev/test VMs

d1d2d3

ODF Storage Pool

Persistent volumes (Ceph)

s1s2s3

Impact analysis

Click on the control plane or any worker pool to analyze the failure blast radius for that component.

Infrastructure

Production VMs

Dev/Test VMs

ODF Storage

Pattern B · Hosted Control Planes (HyperShift)

High ops Hidden cascade risk 7 mandatory gates

Click components to reveal the cascading failure chain

Management Cluster

Mgmt Control Plane

Hosts both hosted CPs

m1m2m3

Hosted CP 1

Pod on mgmt cluster

pod

Hosted CP 2

Pod on mgmt cluster

pod

Hosted Cluster 1 · Workers

Workers + VMs (Data Plane)

Survives mgmt failure

w1w2w3w4w5w6

Hosted Cluster 2 · Workers

Workers + VMs (Data Plane)

Survives mgmt failure

w1w2w3w4w5

Impact analysis

Click on the management cluster, a hosted control plane, or worker nodes to trace the failure cascade path.

Management cluster

Hosted control planes

Worker nodes

7 Mandatory Readiness Gates for HyperShift Adoption

All gates must be cleared before deploying Hosted Control Planes in production. Skipping any gate significantly increases the blast radius of a management cluster failure.

B-1

Management Cluster Sizing

Management cluster must be sized to host all hosted control plane pods with N+1 redundancy. Each hosted CP consumes ~2-4 vCPU and 6-12 GB RAM. Undersizing leads to noisy-neighbor effects between hosted CPs.

B-2

Network Segmentation

Each hosted cluster requires isolated network segments for pod/service CIDRs and ingress. Management and hosted cluster traffic must be segregated. Overlapping CIDRs between hosted clusters cause routing failures.

B-3

Storage Architecture

Hosted control plane etcd data is stored as PVCs on the management cluster. Storage must support the I/O profile of multiple concurrent etcd instances. Latency above 10ms degrades all hosted cluster API responsiveness.

B-4

Operational Team Readiness

Team must demonstrate proficiency in: HyperShift CLI, hosted cluster lifecycle management, troubleshooting hosted CP pod failures, and understanding the control/data plane separation model. Minimum training: Red Hat HyperShift workshop or equivalent.

B-5

Disaster Recovery Plan

Documented and tested DR plan for management cluster failure. Must include: OADP/Velero backup of management cluster etcd, hosted control plane state recovery procedure, tested RTO/RPO targets, and runbook for rebuilding management cluster from backup. Without this gate, management cluster loss may be unrecoverable.

B-6

Monitoring and Alerting

Dedicated monitoring for: management cluster health, hosted CP pod resource consumption, etcd latency per hosted cluster, cross-cluster network connectivity, and cascading failure detection. Alerts must distinguish between hosted CP failure (single cluster) and management cluster failure (cascade).

B-7

Upgrade Strategy

Defined upgrade path for both management cluster and hosted clusters. Management cluster upgrades affect ALL hosted CPs simultaneously - must be planned with maintenance windows. Hosted cluster upgrades are independent but require management cluster API availability. Rolling upgrade sequence must be documented.

Pattern C · ACM-Managed Fleet

Medium ops Smallest blast radius Best isolation

Click any cluster or the ACM Hub to see its failure domain

ACM-Managed Fleet

ACM Hub (Fleet Management)

Policy + GitOps + observability

h1h2h3

Cluster 1 (Standalone)

Own CP + workers

cpw1w2w3w4

Cluster 2 (PCI-Scoped)

Compliance-isolated

cpw1w2w3

Cluster 3 (General)

Independent CP

cpw1w2

Impact analysis

Click any standalone cluster or the ACM Hub to see its failure domain. Other clusters remain completely unaffected.

ACM Hub

Independent clusters

Pattern D · Compact 3-Node

Lowest ops 33% capacity loss on drain Quorum-sensitive

Click any node for single-node failure, or the cluster border for double failure (quorum loss)

Compact 3-Node Cluster · click border for quorum loss

Node 1

CP + Worker + Storage

cpodfvmvm

Node 2

CP + Worker + Storage

cpodfvmvm

Node 3

CP + Worker + Storage

cpodfvm

Each node runs control plane + worker + ODF storage. Draining one node = 33% capacity loss + storage replication degradation. The margin between "degraded" and "dead" is exactly one node.

Impact analysis

Click any node for single-node failure, or the cluster border for double failure (quorum loss) impact analysis.

Combined CP + Worker + Storage

Single-node failure

Pattern E · Hybrid Composition (A + C)

Varies by composition Optimized blast containment Most common in large enterprises

Click either cluster or the ACM Hub to compare their failure domains

Hybrid Composition

ACM Hub

Unified management

h1h2h3

Main Cluster (Pattern A)

Prod + Dev + Infra pools

cp1cp2cp3 w1w2w3w4 d1d2 i1i2

PCI Cluster (Pattern C)

Compliance-scoped

cpw1w2

Impact analysis

Click on either cluster or the ACM Hub to compare their independent failure domains. The hybrid composition optimizes isolation where it matters most.

ACM Hub

Main cluster

PCI cluster

Blast Radius Comparison Matrix

Pattern	Control Plane	Worker Node	Storage	Hub/Mgmt	Ops	Recommended When
A · MachineSet	Full cluster	Pool-scoped	All persistent	N/A	Low-Med	Default. No hard isolation requirement.
B · HyperShift	CASCADE: all hosted CPs	Hosted-scoped	Per-cluster	ALL hosted CPs lost	High	Mature team + strong isolation need + training.
C · ACM Fleet	Single cluster	Single cluster	Single cluster	Fleet visibility lost	Medium	Regulatory isolation. Independent upgrades.
D · Compact 3-Node	Quorum loss = down	33% capacity	Replication degraded	N/A	Low	Small/remote sites. Limited infra.
E · Hybrid	Depends on composition	Contained per domain	Per-cluster	Fleet visibility lost	Varies	Large enterprises. Multiple workload domains.

Blast Radius Calculator +