The title of this post is borrowed intentionally. Years ago, I read Hope Is Not a Method: What Business Leaders Can Learn from America’s Army by General Gordon R. Sullivan (with Michael V. Harper). I found the idea compelling. When colleagues—and especially direct reports—would start in with “I hope…”, I would hand out the book. It is much simpler to passively surmise and hope than it is to articulate an explicit plan of action.

It is a phrase I still use—mostly as a reminder to myself—that discipline begins where hope ends.

Examples of this pattern appear across industries and systems. One illustrative case—unfortunately common—helps frame the issue.

Reporting a system defect often feels like going to the doctor with chest pain.

“Hey doctor, I have a sharp pain when I move like this.”
“It’s probably indigestion. Let me know if it happens again.”

Absent investigation of accompanying symptoms or diagnostics, that borders on malpractice.

Now translate that to systems work.

“We’ve identified transaction anomalies. Here are three concrete examples.”
“Have you seen any more?”
“No.”
“Okay, we’ll close the ticket. Let us know if it happens again.”

This is short-term thinking elevated to a process. Hope substituted for investigation.

The unspoken assumptions are familiar:

  • It’s too costly to research
  • We have bigger priorities
  • We need to onboard a new client
  • We need to update the website
  • This might just be a small glitch

So we focus on visible issues—those affecting more customers or revenue—and defer the rest.

Even triage requires discipline. Scarce engineering time must be allocated. Not every anomaly deserves an immediate deep dive.

The failure happens when triage quietly becomes dismissal.

Because the decision being made is not:
“We understand the cause and accept the risk.”

It is:
“We don’t know the cause, but we assume it’s small.”

That is not prioritization.

That is gambling.

If an issue is truly bounded, there should be a hypothesis explaining why. If investigation is deferred, the residual risk should be explicitly acknowledged and owned. What too often happens instead is neither. The ticket is closed, the assumption goes undocumented, and the system gets the benefit of the doubt without earning it.


The analogy extends further.

An emergency department does not treat patients in the order they arrive. It evaluates each case immediately using defined triage criteria—vitals, symptoms, and known risk indicators—to assess both current severity and potential severity.

They do not assume.

They measure.

In systems work, users report symptoms, not diagnoses.

“Something looks off.”
“This transaction doesn’t reconcile.”
“I’ve never seen this before.”

These are presenting symptoms. They should be triaged the same way.

Well-run organizations recognize this structurally. That is the function of Level 1, 2, and 3 support models:

  • Level 1 captures and standardizes the symptom
  • Level 2 evaluates patterns, context, and reproducibility
  • Level 3 investigates root cause and system implications

Each level represents increasing diagnostic depth.

The failure mode occurs when this structure exists in name but not in practice.

Instead, issues collapse into a single queue and are worked based on convenience:

  • First come, first served
  • Easiest to resolve
  • Most visible to stakeholders
  • Most helpful to ticket metrics

This is not triage.

This is throughput optimization.

And throughput optimization, absent risk weighting, systematically deprioritizes the issues that matter most.

The result is predictable:

  • Low-complexity, low-risk tickets are resolved quickly
  • Metrics improve
  • Latent, high-impact issues remain unexamined

These are the system’s equivalent of undiagnosed chest pain.

A functioning triage model requires explicit criteria. In medicine, those are vitals. In systems, the equivalents are just as definable:

  • Does this violate an expected control or invariant?
  • Is the behavior unexplained or non-reproducible?
  • Does it affect financial accuracy, compliance, or data integrity?
  • Is the failure mode bounded and understood?
  • What is the worst-case impact if the assumption is wrong?

These are not expensive questions. They are the minimum diagnostic baseline.

Without them, prioritization is not occurring.

Assumptions are.

And assumptions—when undocumented and untested—are simply unacknowledged risk positions.


There is a closely related failure that reinforces this pattern: documentation.

Even when an issue is investigated and resolved, the work often stops at the immediate fix.

A developer identifies the symptom, applies a correction, validates that the system is functioning again, and moves on. Under time pressure, this is understandable.

But what is left behind is incomplete:

  • The root cause is not clearly articulated
  • The reasoning behind the fix is not documented
  • The boundaries of the fix are not defined
  • The residual risk is not acknowledged

So while the system is operational again, the organization has not actually learned anything durable.

This is the equivalent of treating the symptom without recording the diagnosis.

If the issue reappears—and in complex systems, it often does—the next investigator starts from zero:

  • The same symptoms are rediscovered
  • The same assumptions are made
  • The same fixes are applied

And the cycle repeats.

This is how organizations end up “fixing” the same problem multiple times.

Not because the problem is unsolvable, but because the knowledge never compounds.

In medicine, this would be unthinkable. Patient history is foundational. Prior symptoms, diagnoses, treatments, and outcomes are recorded precisely so that future decisions improve.

Systems work requires the same discipline.

Each incident should leave behind more than a resolution. It should leave behind:

  • A working hypothesis of root cause
  • The fix that was applied
  • Evidence supporting why that fix was believed to work
  • Any uncertainty or alternative explanations
  • The conditions under which the issue might recur

This is not overhead.

This is how isolated events become understood failure modes.

Without this, triage degrades again.

Because when there is no historical context, every issue looks isolated.

And isolated issues are easy to dismiss.

With documentation, patterns emerge:

  • “We’ve seen this before”
  • “This fix failed under these conditions”
  • “This is not a new issue—it is an unresolved one”

At that point, the organization is no longer relying on hope.

It is building a method.


Consider a more concrete example.

Those of a certain age will recall the Space Shuttle Challenger disaster. The failure traced back to O-ring performance in cold temperatures—an issue that had presented as a known but intermittent anomaly in prior launches.

The signal was there.

It was observed, discussed, and ultimately discounted because:

  • prior missions had largely succeeded
  • the failures appeared isolated
  • the system, in aggregate, seemed to function

The assumption was not that the issue was understood and acceptable.

The assumption was that it was not material.

Now translate that pattern into a financial system.

Assume a core dependency: an API connection to a payment processor—the main artery of transaction flow.

Control reports begin to show missing or unapplied payments:

  • The majority of transactions process correctly
  • Failures are intermittent
  • No immediate pattern is obvious

So the conclusion becomes:
“This is not material. We will monitor.”

At the same time:

  • Website UX issues generate visible customer frustration
  • Complaints are explicit and frequent
  • These issues are prioritized because they are observable

So effort shifts accordingly.

The visible problems get solved.

The invisible ones get deferred.

Until they don’t stay invisible:

  • Clients begin to notice discrepancies
  • Regulators ask questions about control integrity
  • Lawsuits emerge over unapplied payments

At that point, the issue is no longer a defect.

It is a failure of assumption.

The system that “mostly worked” was, in fact, unreliable at its most critical point.

This is the same failure pattern:

  • A symptom appears in a high-consequence system
  • The symptom is treated as isolated because outcomes are mostly normal
  • No explicit risk assessment is documented
  • Investigation is deferred in favor of more visible priorities

In medicine, this would be indefensible.

You do not ignore chest pain because most patients survive the day.

You investigate because the downside is asymmetric.

The same standard applies to systems that handle money, controls, and trust.

When something that “should not be possible” occurs, the cost is not the event itself.

The cost is what it implies about the system’s assumptions, controls, and failure boundaries.


So the real question is not:
“Do we have time to investigate?”

It is:
“Are we knowingly accepting an unknown risk in a system that depends on correctness?”

Deferral accompanied by triage may be acceptable.

Silent assumption is not.

Hope is not a method.

Posted in

If you have a perspective to add or a different way of seeing this, I’d welcome the discussion below. If you’d rather reach out directly, you can also connect through the Contact page.

Leave a comment