What Breaks When: An Interactive Cluster Failure Explorer

Let me show you something.

Architecture diagrams are static. They show components, boundaries, and arrows. What they do not show is what happens when one of those components fails.

This one does.

I laid out five OpenShift cluster patterns side by side. None of them are hypothetical, I pulled each from real production environments: single cluster with multiple node pools, hosted control planes, ACM-federated fleets, air-gapped stacks, and isolated compliance zones. Click any component and watch what propagates. The red pulse is the blast radius: everything impacted, directly or indirectly, when that component fails (FN-0002).

The goal is not to pick a winner. It is to make the shape of failure visible here, on a page, before it shows up in your runbooks (FN-0015).

Click to break a cluster

How to read it

Node pills (cp1, w1, w2, s1…) are the real architectural units. Each card groups them into pools, clusters, or control planes.
Dashed connectors show hierarchical dependencies. Data and control flow along them.
Red pulse (blast) marks what has failed or been directly impacted.
Pattern tabs at the top switch between architectures. The same failure question (“what breaks when the control plane dies?”) produces different answers in each, because the layers that look independent on the diagram carry different coupling underneath (FN-0013).
Calculator at the bottom lets you plug in your own VM counts, node counts, and storage footprint. It recalculates the blast radius in real time.

Try this

Two quick experiments to get the feel of it.

Open Pattern A (single cluster, multi-pool). Click the Control Plane card. Notice that every pool turns red, not just the workers. A single OpenShift cluster has a single control plane, and losing it stops every workload, regardless of how carefully you segmented the worker pools.

Now switch to Pattern B (hosted control planes). Click the same Control Plane card on one of the hosted clusters. This time, only that cluster fails. The management cluster and other hosted clusters stay green. Same failure event, very different blast radius. That difference is the thing your availability math has to account for, and it is the part that diagrams alone cannot tell you. The economics of this specific trade-off live in Cost Optimization vs Risk Concentration in Hosted Control Planes.

Where this comes from

The explorer above is the Blast Radius module from RVTMA, a private RVTools-driven analysis tool I build for enterprise VMware-to-OpenShift migrations. It ingests an RVTools export and renders the target OpenShift architecture along with its operational risks. The Blast Radius view is one of several visualizations it produces from real cluster inventories.

What you are interacting with here is a standalone cut of that module. No data upload, no backend, no tracking. Just the visual vocabulary and the interaction model, served as a courtesy so you can see the patterns without running the full tool.

A glimpse of what changes when failure is treated as something you can see, not just reason about.

The question I keep circling back to, what changes in the shape of failure when you change the shape of the architecture? (FN-0004), is the thread that runs through Cloud-Native, Same Old Fragility, The Hidden Reliability Risks in Multi-Cluster Kubernetes, and into the upcoming essay on single points of failure in cloud-native architectures.

Part of Field Lab. Break things on purpose.

How to read it#

Try this#

Where this comes from#

How to read it

Try this

Where this comes from