Observability Theater: When Dashboards Replace Understanding

Your team has Datadog, Grafana, New Relic, and a dozen custom dashboards. Alerts fire constantly. Engineers stare at metrics all day. And nobody actually understands why the system is slow.

TL;DR

Audit your observability stack: are dashboards actually viewed? Do alerts lead to action? Observability theater costs money without providing insight.

I understand why teams adopt this approach—it solves real problems.

The observability market is projected to reach $172 billion by 2035. Companies spend 17% of their total infrastructure budget just on watching their infrastructure. Developers now average 18 different data sources to monitor their systems. We've built an entire industry around looking at systems without understanding them.

I've watched this pattern emerge over the past decade. The more sophisticated our monitoring becomes, the less engineers seem to grasp what their code actually does. We've replaced understanding with surveillance.

The Illusion of Control

Dashboards create a powerful psychological effect: the sense that you're in control. Graphs move. Numbers update. Colors shift from green to yellow to red. It feels like knowledge.

But watching metrics isn't the same as understanding systems. A dashboard can tell you that p99 latency spiked at 3:47 PM. It can't tell you why. It can show you that error rates increased after a deployment. It can't explain the interaction between the new code and the legacy service that caused the failure.

Metrics are symptoms, not diagnoses. Knowing your CPU hit 95% tells you something is consuming resources. It doesn't tell you which code path, why it's inefficient, or how to fix it. That requires reading the code. That requires understanding the system.

This is similar to how AI vendors oversell their accuracy claims - the numbers look impressive, but they don't represent the reality you'll face in production. A beautiful dashboard can hide profound ignorance.

The $65 Million Bill Problem

In 2022, a financial services firm received a $65 million bill from Datadog for a single quarter. The story became industry legend, but it reveals something deeper than pricing complexity.

How does a company accumulate $65 million in observability costs without realizing it? Because nobody understood what they were monitoring or why. They were collecting everything because they didn't know what mattered. They were paying for visibility they never used.

Today, mid-sized companies routinely spend $50,000 to $150,000 per year on Datadog alone. Enterprise deployments exceed $1 million annually. And most of those metrics go unread. Most of those dashboards are opened once and forgotten.

The observability bill is often a tax on not understanding your system. When you know what matters, you monitor what matters. When you don't know, you monitor everything and hope the answer emerges from the noise.

The 99% Write-Only Tax

Here's the economics that makes observability theater so expensive:

99% of logs are written, indexed, stored, billed for—and never queried. Not once. The data flows in, gets compressed, ages out after 30 days, and nobody ever looked at it.

But you paid for every step of that journey. You paid to generate the log. You paid to ship it. You paid to index it. You paid to store it. You paid for the infrastructure to make it queryable. All for data that served exactly zero diagnostic purpose.

This is Write-Only Economics: the cost of potential insight scales with data volume, but actual insight doesn't. Doubling your log volume doesn't double your understanding. It doubles your bill while your understanding stays flat.

The vendors know this. Their business model depends on it. Usage-based pricing means they profit from your uncertainty. The less you understand about what matters, the more data you collect "just in case," and the more they earn.

Before Dashboards Existed

I started debugging systems when the primary tools were printf statements and log files. There was no APM. No distributed tracing. No real-time metrics streaming to beautiful visualizations.

Here's what we did instead: we read the code. We understood the algorithms. We knew the data structures. When something was slow, we could reason about why from first principles. We didn't need a trace to tell us that a nested loop was O(n squared) - we could see it in the code.

This isn't nostalgia for worse tools. Modern observability capabilities are genuinely useful. But something got lost in the transition. Engineers who learned to debug before modern tooling developed a different relationship with their code. They had to understand it because there was no alternative.

Today, I observe engineers who can navigate Datadog expertly but can't explain what their service actually does at the code level. They've learned to read dashboards instead of codebases.

Eight Tools, Zero Understanding

According to Grafana's 2025 Observability Survey, companies use an average of eight different observability technologies. Large enterprises average 24 data sources. Developers spend almost 40% of their time on toolchain maintenance and integration - more than double the rate from 2021.

Think about that: engineers are spending nearly half their time maintaining the tools they use to watch their systems, rather than improving the systems themselves. This is the layer tax applied to observability - each tool adds overhead without necessarily adding understanding.

The proliferation of tools reflects a hope that the right combination of metrics, logs, and traces will somehow reveal the truth about a system. But tools can only show you what they're configured to show. They can't give you the mental model that makes the data meaningful.

More data sources often means less clarity. When you have 24 different places to look for answers, you spend your time context-switching between dashboards instead of thinking deeply about the problem.

Alert Fatigue Is a Symptom

Every team I've observed with sophisticated monitoring eventually complains about alert fatigue. Pages fire constantly. Engineers start ignoring notifications. Critical alerts get lost in the noise.

Alert fatigue isn't a configuration problem. It's a knowledge problem. When you understand your system deeply, you know what conditions are actually dangerous and what conditions are normal variation. You can set meaningful thresholds because you understand what the numbers mean.

When you don't understand the system, every anomaly looks potentially critical. You alert on everything because you can't distinguish signal from noise. The cure for alert fatigue isn't better alert rules - it's deeper understanding of what you're monitoring.

The Cognitive Load Crisis

Research consistently shows that developers experiencing high cognitive load make more errors and work less efficiently. When engineers have to hold too many details in working memory, mistakes increase and creativity drops.

Modern observability stacks contribute to cognitive load rather than reducing it. Eighteen data sources. Eight different tools. Constant context-switching between dashboards. The information is scattered across so many systems that synthesizing it into understanding becomes its own full-time job.

Platform engineering has emerged partly as a response to this crisis - the idea that organizations should systematically reduce cognitive load. But adding a platform team to manage your observability tools is treating symptoms, not causes. The underlying problem remains: we've optimized for data collection over comprehension.

What Actually Works

I've seen teams that use observability tools effectively. They share common patterns:

They limit their tools. One or two observability platforms, not eight. They accept some capability gaps in exchange for reduced cognitive overhead.

They require code understanding first. New engineers read the codebase before they learn the dashboards. They understand the architecture before they start monitoring it.

They measure what they understand. Every metric they track, they can explain. Every alert threshold, they can justify. They don't collect data hoping it might be useful someday.

They use dashboards to confirm hypotheses, not generate them. When something breaks, they start with a theory about what went wrong based on their knowledge of the system. Then they use observability data to verify or refute the theory. The tools support understanding; they don't replace it.

They budget observability like any other cost. That 17% of infrastructure spend isn't inevitable. Teams that understand their systems can often achieve better outcomes with simpler, cheaper monitoring.

When Heavy Instrumentation Makes Sense

I'm not saying sophisticated observability is always theater. It provides real value when:

Your system genuinely requires it. Massive distributed systems with hundreds of microservices need the tooling. The complexity is proportional to the problem.
The team understands the code first. Observability amplifies existing knowledge. Engineers who know their systems use dashboards to confirm hypotheses, not replace understanding.
You're using it for specific questions. Targeted instrumentation around known problem areas beats blanket data collection. Measure what you need to answer questions you're actually asking.

But for most teams with modest systems, simpler tooling plus deeper code knowledge beats expensive observability stacks that nobody fully understands.

Observability Health Audit

Score your team's observability practices. Theater scores high; understanding scores low.

The Bottom Line

Observability tools are useful. The ability to trace requests across distributed systems, to correlate events across services, to visualize system behavior over time - these capabilities have genuine value. The problem isn't the tools. It's the assumption that tools can substitute for understanding.

The most effective engineers I've worked with treat observability as a complement to code knowledge, not a replacement for it. They can explain what their service does by reading the code. They use dashboards to see the code's behavior at scale, not to learn what the code does in the first place.

The observability industry will keep growing. Vendors will keep adding features. Dashboards will keep getting more sophisticated. None of it will help if engineers don't understand the systems they're watching. You can't observe your way to understanding. At some point, you have to read the code.

"You can't observe your way to understanding. At some point, you have to read the code."

Sources

OpenPR: Observability Market to Hit $172.1 Billion by 2035 — Market research showing observability platform growth trajectory and key vendor market share
Grafana Labs: Observability Survey 2025 — Industry data showing companies average 8 observability tools, 17% of infrastructure spend on observability, and developers averaging 18 data sources
The New Stack: Datadog's $65M Bill — Analysis of high-profile Datadog billing incident and broader implications for observability costs

Architecture Review

Is your observability stack adding clarity or complexity? Get perspective from someone who debugged systems before dashboards existed.

Get Assessment