top of page
Media (14)_edited.jpg

THE CONTROL ROOM

Where strategic experience meets the future of innovation.

How to Build Operational Resilience: Stress Testing Strategies from the Nuclear Navy

  • Writer: Tony Grayson
    Tony Grayson
  • Nov 22, 2025
  • 9 min read

Updated: Dec 22, 2025

By Tony Grayson, Tech Executive (ex-SVP Oracle, AWS, Meta) & Former Nuclear Submarine Commander


Published: November 22, 2025 | Last Updated: December 22, 2025


Tony Grayson's USS Providence nuclear submarine docked at pier, representing operational resilience and high-reliability business continuity planning.

TL;DR — Key Takeaways

  • Business continuity ≠ operational resilience. A plan tells you what to do; resilience ensures you can execute under pressure when cortisol floods your brain.

  • Tabletop exercises create false confidence. Announced drills in calm conditions test checklists, not people. Real crises don't schedule themselves.

  • Stress transforms leaders. Cognitive tunneling degrades memory, motor control, and decision-making. You must train until muscle memory overrides panic.

  • Three steps to real resilience: Run unannounced drills (red teaming), degrade information to 60%, and conduct ruthless hot wash debriefs.


Six months of nuclear power school taught us the fundamentals: thermodynamics, reactor physics, and electrical systems. Then we went to the prototype, where we stood watch on an actual operating naval reactor.


We walked through the casualty procedures dozens of times. Loss of coolant. Steam line rupture. Electrical failures. We knew every step, every valve lineup, every communication protocol.


My watch section had it down cold. Or so we believed.


Then the simulator threw our first casualty at us without warning. The alarm sounded. And we fell apart.


The guy who calmly explained procedures in the walkthrough froze at his panel. Our most confident operator started calling out the wrong valve numbers. I watched communication—the thing we'd practiced most—completely disintegrate. People talked over each other. Critical information got lost. Someone made a decision we'd specifically drilled not to make.


This was a training simulator. The reactor wasn't real. The consequences weren't real.


But the stress response was absolutely real.


Here's what we learned that day: knowing what to do and executing under pressure are entirely different skills.


The Psychology of Crisis: Why Leaders Freeze


The gap between your plan and your performance is biological. When your body floods with cortisol, your brain undergoes acute stress response.


Vision narrows. Fine motor control degrades. Memory recall fails. The methodical thinker becomes scattered. The confident leader needs direction. Personalities don't just shift; they transform. As I wrote in Contextual Intelligence vs. Servant Leadership, the supportive style that works in peacetime can actually become a liability when the ship is taking on water.


The nuclear Navy figured this out decades ago: you don't drill to memorize procedures. You drill until the stress response becomes the trained response. You practice until muscle memory overrides panic.


Operational Resilience vs. Business Continuity


Most organizations confuse having a plan with being ready. To rank for survival, you must understand the difference:

  • Business Continuity (ISO 22301) is a document. It is a checklist reviewed in a calm conference room that assumes rational actors and predictable timelines.

  • Operational Resilience is a capability. It is the cultural and physical ability to absorb chaos, adapt to broken variables, and execute a failover when the checklist no longer applies.


Most companies run tabletop exercises where they announce the drill in advance. They review their checklists during business hours when everyone is fresh. They stop the exercise when it gets uncomfortable. Then they check the box and assume they're ready.


They’re not.


How to Stress Test Your Organization


To bridge the gap between "we have a plan" and "we can survive," you need to move beyond standard drills and embrace the principles of High Reliability Organizations (HROs).


Here is how to build operational resilience through realistic stress testing:


1. Run Drills Without Warning (Red Teaming) This is what Red Teaming actually looks like. Do not put the drill on the calendar. Introduce chaos when teams are tired or distracted.

2. Degrade the Information In a real crisis, you never have the full picture. During your exercises, remove key people mid-scenario. Cut off communication channels. Feed the team conflicting data. Force them to make decisions with 60% of the information.

3. Debrief Ruthlessly After the drill, hold a "hot wash." This isn't about politeness; it's about survival. Ask not "what should have happened," but "what actually happened."

  • Who performed?

  • Who didn't?

  • Where did communication fail?

  • What assumptions were wrong?


Conclusion: Discovering Who You Are

Your organization won't sink if your crisis plan fails. But without leadership under pressure, your reputation will suffer. Your customers will leave. Your leaders will be exposed. And you'll join the long list of companies that had a plan right up until the moment they needed it.


The question isn't whether you have procedures. It's whether you've developed the operational resilience to execute them when it's hard, when people are tired, when nothing goes according to plan.


As I’ve learned from the Six-Factor Formula, every variable matters when the system goes critical. That is when you will discover who you really are.


Video: Inside the High-Stress World of Nuclear Subs



For a look at the complexity and coordination required in these environments, this video from Smarter Every Day breaks down life inside a US Navy Nuclear Submarine.

Frequently Asked Questions about Operational Resilience

What is the main difference between Business Continuity and Operational Resilience?

Business Continuity (defined by ISO 22301) is typically a static document—a checklist reviewed in a calm conference room that assumes rational actors and predictable timelines. Operational Resilience is a dynamic, cultural capability that allows an organization to adapt, absorb, and recover from unexpected chaos when those static plans fail. Most organizations confuse having a plan with being ready. Business continuity tells you what to do; operational resilience ensures you can actually execute under pressure.


Why do traditional tabletop exercises fail?

Tabletop exercises often fail because they are conducted in low-stress environments with complete information. They test the procedure, but they do not test the people or their physiological response to stress (cortisol spikes), leading to a false sense of security. The drill is announced in advance, conducted during business hours when everyone is fresh, and stopped when it gets uncomfortable. This approach tests the checklist, not whether your team can actually execute when flooded with cortisol and missing half the information.


What is Red Teaming in a corporate strategy context?

Red Teaming involves stress-testing an organization's strategies by introducing active, unannounced adversaries or chaotic variables. The concept originated in Cold War military strategy, where opposing forces simulated enemy tactics to test defensive readiness. Unlike a standard drill, red teaming forces leaders to make decisions with incomplete information and real-time consequences. As IBM explains, it doesn't announce the exercise—it introduces chaos when teams are tired or distracted, simulating real-world attack conditions.


How does stress affect leadership decision-making?

High stress floods the body with cortisol, triggering the acute stress response. Research shows cortisol levels increase within minutes of stress onset and remain elevated for 40-60 minutes. This causes "cognitive tunneling"—vision narrows, fine motor control degrades, memory recall fails, and creative problem-solving is limited. The methodical thinker becomes scattered. The confident leader needs direction. Research from the PMC shows leaders often freeze or default to incorrect muscle memory. The prefrontal cortex, responsible for complex thinking, is particularly impaired under acute stress.


What is a High Reliability Organization (HRO)?

High Reliability Organizations (HROs) are organizations that operate in complex, high-hazard domains for extended periods without serious accidents or catastrophic failures. Researchers at UC Berkeley (Todd LaPorte, Gene Rochlin, and Karlene Roberts) studied nuclear aircraft carriers, the FAA's Air Traffic Control system, and nuclear power operations to identify five central principles. Karl Weick and Kathleen Sutcliffe formalized these principles in their book "Managing the Unexpected." Nuclear power and aviation are classic examples—industries where even a slight error can have catastrophic consequences, yet achieve and sustain extraordinary safety levels.


What are the 5 principles of High Reliability Organizations?

The five HRO principles identified by Weick and Sutcliffe, adopted by the Agency for Healthcare Research and Quality (AHRQ), are: (1) Preoccupation with failure—viewing near misses as opportunities to improve rather than proof of success; (2) Reluctance to simplify—accepting that work is complex with potential to fail in new ways; (3) Sensitivity to operations—heightened awareness of relevant systems and processes; (4) Commitment to resilience—prioritizing emergency training for unlikely but possible failures; (5) Deference to expertise—valuing insights from staff with pertinent safety knowledge over those with seniority. These principles enable organizations to focus attention on emergent problems and deploy the right resources to address them.


What is Chaos Engineering and how does Netflix use it?

Chaos Engineering is a discipline of experimenting on distributed systems by intentionally introducing failures to build confidence in the system's capability to withstand turbulent conditions. Netflix coined the term in 2010 and created Chaos Monkey—a tool that randomly terminates production servers to test resilience. The name comes from imagining a monkey in your data center randomly ripping cables and destroying devices. Netflix expanded this into the Simian Army suite including Chaos Gorilla (simulates entire availability zone outages) and Chaos Kong (simulates full region failures). Netflix runs these experiments in production traffic to achieve minutes of downtime per year despite serving millions of users.


What is a 'hot wash' debrief and why is it important?

A "hot wash" is an immediate post-drill debrief focused on ruthless honesty about what actually happened—not what should have happened. This isn't about politeness; it's about survival. Key questions include: Who performed? Who didn't? Where did communication fail? What assumptions were wrong? The hot wash is conducted while events are fresh, before rationalization sets in. HROs use this practice to continuously improve because they understand that near-misses are windows into system health. The goal is to identify and fix weaknesses before they manifest in real emergencies. This practice is central to military after-action reviews.


Why is knowing what to do different from executing under pressure?

In nuclear power school, watch sections walk through casualty procedures dozens of times—loss of coolant, steam line rupture, electrical failures. They know every step, valve lineup, and communication protocol. But when the simulator throws the first unannounced casualty, teams fall apart. Confident operators call wrong valve numbers. Communication disintegrates. People make decisions they specifically drilled not to make. The stress response was absolutely real even in a training simulator. The nuclear Navy learned: you don't drill to memorize procedures—you drill until the stress response becomes the trained response, until muscle memory overrides panic. As discussed in Contextual Intelligence vs. Servant Leadership, the supportive style that works in peacetime can become a liability when the ship is taking on water.


How should organizations stress test beyond standard drills?

To bridge the gap between "we have a plan" and "we can survive," organizations should embrace High Reliability Organization principles: (1) Run drills without warning (red teaming), don't put them on the calendar, introduce chaos when teams are tired or distracted; (2) Degrade the information—remove key people mid-scenario, cut communication channels, feed conflicting data, force decisions with 60% of information; (3) Debrief ruthlessly—conduct a hot wash asking what actually happened, not what should have happened. This approach mirrors chaos engineering practices used by Netflix and operational readiness testing on nuclear submarines and aircraft carriers.


What is the acute stress response and how does it affect performance?

The acute stress response (fight-or-flight) is a physiological cascade triggered by perceived threats. The hypothalamic-pituitary-adrenal (HPA) axis releases glucocorticoids like cortisol, while the sympathetic-adrenal-medullary (SAM) system releases catecholamines. Research shows this affects three brain regions critical for decision-making: the amygdala, hippocampus, and prefrontal cortex. Systematic reviews show that stressed individuals make riskier decisions, exhibit reduced risk perception, rely more on habitual responding than on deliberate analysis, and perform more poorly on tasks requiring complex thinking. Training under realistic stress conditions helps override these default responses.


How do you apply military operational resilience principles to business?

The nuclear Navy learned decades ago that you don't drill to memorize procedures—you drill until the stress response becomes the trained response. Business applications include: Run unannounced failure scenarios (red teaming); Test with incomplete information; Practice failover mechanisms before you need them; Conduct ruthless debriefs; Build cultural resilience, not just procedural compliance. At data center companies like Northstar Enterprise + Defense, this means applying the same discipline required to live in a steel tube—precise attention to systems, reliance on procedure, and ability to endure chaos—to deploying modular data centers in austere environments. As discussed in The Six-Factor Formula, every variable matters when the system goes critical.


Who is Tony Grayson?

Tony Grayson is President & General Manager of Northstar Enterprise + Defense, a former U.S. Navy nuclear submarine commander (USS Providence SSN-719), and recipient of the Vice Admiral James Bond Stockdale Award for Inspirational Leadership. He previously served as SVP of Physical Infrastructure at Oracle, and held executive roles at AWS and Meta. Tony is a Top 10 Data Center Influencer and Veterans Chair for Infrastructure Masons.


What qualifies Tony Grayson to write about operational resilience?

Tony Grayson commanded a nuclear submarine where operational resilience is life-or-death. He spent 21 years in the U.S. Navy with DOE/Naval Reactors nuclear certification, then led global infrastructure for Oracle ($1.3B budget, 1,000+ team), Meta (30+ data centers, 8M+ sq ft), and AWS. He applies military stress testing and High Reliability Organization principles to enterprise and defense infrastructure at Northstar.




____________________________________


Tony Grayson is a recognized Top 10 Data Center Influencer, a successful entrepreneur, and the President & General Manager of Northstar Enterprise + Defense.


A former U.S. Navy Submarine Commander and recipient of the prestigious VADM Stockdale Award, Tony is a leading authority on the convergence of nuclear energy, AI infrastructure, and national defense. His career is defined by building at scale: he led global infrastructure strategy as a Senior Vice President for AWSMeta, and Oracle before founding and selling a top-10 modular data center company.


Today, he leads strategy and execution for critical defense programs and AI infrastructure, building AI factories and cloud regions that survive contact with reality.


Read more at: tonygraysonvet.com

Comments


bottom of page