Cluster Pattern Blast Radius Comparison

Interactive failure domain analysis for architecture decision support
RVTMA Reference Patterns
Pattern A · Single Cluster + MachineSet Segregation
Low-medium ops Largest blast radius Recommended default
Click any component to visualize its failure blast radius
OpenShift Cluster
Control Plane
API server · etcd · scheduler
cp1cp2cp3
Infra Pool
Ingress + monitoring
i1i2i3
Virt-Prod Pool
Production VMs
w1w2w3w4w5w6
Virt-NonProd Pool
Dev/test VMs
d1d2d3
ODF Storage Pool
Persistent volumes (Ceph)
s1s2s3
Impact analysis
Click on the control plane or any worker pool to analyze the failure blast radius for that component.
Infrastructure
Production VMs
Dev/Test VMs
ODF Storage
Pattern B · Hosted Control Planes (HyperShift)
High ops Hidden cascade risk 7 mandatory gates
Click components to reveal the cascading failure chain
Management Cluster
Mgmt Control Plane
Hosts both hosted CPs
m1m2m3
Hosted CP 1
Pod on mgmt cluster
pod
Hosted CP 2
Pod on mgmt cluster
pod
Hosted Cluster 1 · Workers
Workers + VMs (Data Plane)
Survives mgmt failure
w1w2w3w4w5w6
Hosted Cluster 2 · Workers
Workers + VMs (Data Plane)
Survives mgmt failure
w1w2w3w4w5
Impact analysis
Click on the management cluster, a hosted control plane, or worker nodes to trace the failure cascade path.
Management cluster
Hosted control planes
Worker nodes
7 Mandatory Readiness Gates for HyperShift Adoption
All gates must be cleared before deploying Hosted Control Planes in production. Skipping any gate significantly increases the blast radius of a management cluster failure.
B-1
Management Cluster Sizing
Management cluster must be sized to host all hosted control plane pods with N+1 redundancy. Each hosted CP consumes ~2-4 vCPU and 6-12 GB RAM. Undersizing leads to noisy-neighbor effects between hosted CPs.
B-2
Network Segmentation
Each hosted cluster requires isolated network segments for pod/service CIDRs and ingress. Management and hosted cluster traffic must be segregated. Overlapping CIDRs between hosted clusters cause routing failures.
B-3
Storage Architecture
Hosted control plane etcd data is stored as PVCs on the management cluster. Storage must support the I/O profile of multiple concurrent etcd instances. Latency above 10ms degrades all hosted cluster API responsiveness.
B-4
Operational Team Readiness
Team must demonstrate proficiency in: HyperShift CLI, hosted cluster lifecycle management, troubleshooting hosted CP pod failures, and understanding the control/data plane separation model. Minimum training: Red Hat HyperShift workshop or equivalent.
B-5
Disaster Recovery Plan
Documented and tested DR plan for management cluster failure. Must include: OADP/Velero backup of management cluster etcd, hosted control plane state recovery procedure, tested RTO/RPO targets, and runbook for rebuilding management cluster from backup. Without this gate, management cluster loss may be unrecoverable.
B-6
Monitoring and Alerting
Dedicated monitoring for: management cluster health, hosted CP pod resource consumption, etcd latency per hosted cluster, cross-cluster network connectivity, and cascading failure detection. Alerts must distinguish between hosted CP failure (single cluster) and management cluster failure (cascade).
B-7
Upgrade Strategy
Defined upgrade path for both management cluster and hosted clusters. Management cluster upgrades affect ALL hosted CPs simultaneously - must be planned with maintenance windows. Hosted cluster upgrades are independent but require management cluster API availability. Rolling upgrade sequence must be documented.
Pattern C · ACM-Managed Fleet
Medium ops Smallest blast radius Best isolation
Click any cluster or the ACM Hub to see its failure domain
ACM-Managed Fleet
ACM Hub (Fleet Management)
Policy + GitOps + observability
h1h2h3
Cluster 1 (Standalone)
Own CP + workers
cpw1w2w3w4
Cluster 2 (PCI-Scoped)
Compliance-isolated
cpw1w2w3
Cluster 3 (General)
Independent CP
cpw1w2
Impact analysis
Click any standalone cluster or the ACM Hub to see its failure domain. Other clusters remain completely unaffected.
ACM Hub
Independent clusters
Pattern D · Compact 3-Node
Lowest ops 33% capacity loss on drain Quorum-sensitive
Click any node for single-node failure, or the cluster border for double failure (quorum loss)
Compact 3-Node Cluster · click border for quorum loss
Node 1
CP + Worker + Storage
cpodfvmvm
Node 2
CP + Worker + Storage
cpodfvmvm
Node 3
CP + Worker + Storage
cpodfvm
Each node runs control plane + worker + ODF storage. Draining one node = 33% capacity loss + storage replication degradation. The margin between "degraded" and "dead" is exactly one node.
Impact analysis
Click any node for single-node failure, or the cluster border for double failure (quorum loss) impact analysis.
Combined CP + Worker + Storage
Single-node failure
Pattern E · Hybrid Composition (A + C)
Varies by composition Optimized blast containment Most common in large enterprises
Click either cluster or the ACM Hub to compare their failure domains
Hybrid Composition
ACM Hub
Unified management
h1h2h3
Main Cluster (Pattern A)
Prod + Dev + Infra pools
cp1cp2cp3 w1w2w3w4 d1d2 i1i2
PCI Cluster (Pattern C)
Compliance-scoped
cpw1w2
Impact analysis
Click on either cluster or the ACM Hub to compare their independent failure domains. The hybrid composition optimizes isolation where it matters most.
ACM Hub
Main cluster
PCI cluster
Blast Radius Comparison Matrix
PatternControl PlaneWorker NodeStorageHub/MgmtOpsRecommended When
A · MachineSetFull clusterPool-scopedAll persistentN/ALow-MedDefault. No hard isolation requirement.
B · HyperShiftCASCADE: all hosted CPsHosted-scopedPer-clusterALL hosted CPs lostHighMature team + strong isolation need + training.
C · ACM FleetSingle clusterSingle clusterSingle clusterFleet visibility lostMediumRegulatory isolation. Independent upgrades.
D · Compact 3-NodeQuorum loss = down33% capacityReplication degradedN/ALowSmall/remote sites. Limited infra.
E · HybridDepends on compositionContained per domainPer-clusterFleet visibility lostVariesLarge enterprises. Multiple workload domains.
Blast Radius Calculator +