Your AI Doesn't Deserve Your Trust Yet

Nobody hands a new employee the keys to the vault on day one. But that's exactly what most AI products demand: full access, immediately, before proving they won't burn the place down.

TL;DR

Start your AI at Level 1: read-only. Prove it can watch before it advises, advise before it acts, act within guardrails before it earns autonomy. Score your system on the Trust Readiness Scorecard before granting the next level.

You know the pitch. An AI system that manages your money, optimizes your spending, catches fraud before it happens. Flawless demo. Confident founder. Hockey-stick VC deck.

Having spent the last year building a cross-domain AI platform that connects to financial accounts, I can tell you the hardest engineering problem wasn't the model, the data pipeline, or the integrations. It was answering a question that sounds simple but isn't: how much is this system allowed to do?

The Permission Problem Nobody Talks About

Every AI product in fintech faces the same tension. Users want magic. They want the system to "just handle it." But the moment something goes wrong with their money, the trust evaporates instantly and permanently.

According to a 2025 KPMG global study of over 48,000 people across 47 countries, only 46% are willing to trust AI systems. That number has actually declined since before ChatGPT launched. More exposure to AI hasn't built confidence. It's eroded it.

Here's why. Most AI products ask for trust upfront. Sign up, connect your bank, let us optimize everything. That's backwards. Trust isn't a checkbox. It's a gradient.

Autonomy Must Be Earned in Stages

The car industry learned this the hard way. SAE's six levels of driving automation exist because nobody jumped from manual steering to robotaxis. After billions in investment, 66% of Americans still fear fully self-driving vehicles. S&P Global found consumers overwhelmingly prefer Level 2 and 3 features over full autonomy. People don't distrust autonomous vehicles because they're unsafe. They distrust them because they're opaque. Nobody can explain why the car made that lane change. The same opacity problem kills AI products: if users can't see the reasoning, they won't trust the output.

AI systems that touch financial data need the same graduated approach. Not because it's theoretically elegant, but because the alternative is what we've already seen: products that collapse the moment users discover what the system actually does with their data.

The Four Levels of AI Trust

Here's the framework I've landed on after building a system that connects financial, calendar, and health data. Four levels, each earned by proving competence at the level below.

Level 1: Observer

The system watches. It cannot act. Read-only access to data. It monitors patterns, detects anomalies, and reports what it sees. It cannot move money, cancel subscriptions, or change anything. The user reviews the observations and decides what to do.

Every AI system starts here. No exceptions. As I've argued before, LLMs have no intent. They're probabilistic engines trying to function inside a deterministic banking system. That mismatch is the root of the risk — they'll confidently optimize your portfolio into the ground if you let them. If you can't build something valuable at Level 1, you don't have a product. You have a feature request wrapped in a fundraising deck.

How do you know when Level 1 has earned promotion? Track these silent indicators — metrics the user never sees but your engineering team should obsess over:

Anomaly detection precision. Of the patterns your system flagged, what percentage were genuinely useful? If you're surfacing noise ("You spent $4.50 at Starbucks again"), you're not ready. Track your signal-to-noise ratio weekly. Target: 80%+ of flagged items are actionable.
Discovery rate. How often does the system surface something the user didn't already know? A forgotten subscription, a duplicate charge, a fee that quietly increased. If the answer is "rarely," you're an expensive dashboard, not an observer.
Engagement-without-prompting. Are users opening the app to check your observations voluntarily? Not because you pushed a notification — because they wanted to see what you found. Organic engagement is the strongest trust signal you'll get at Level 1.
Data consistency score. How often do your observations match what the user's bank statement actually says? Discrepancies — even small ones, even rounding errors — destroy trust at the foundation. Target: 99.9% match rate before moving forward.
Dwell time on insights. When users see your observations, do they read them or scroll past? Measure time spent on each insight card. If median dwell time is under 2 seconds, your observations aren't interesting enough to earn recommendation authority.

None of these require user action. They're engineering metrics that prove your system understands the user's financial world before it opens its mouth.

Level 2: Advisor

The system recommends. The user approves. Based on patterns observed at Level 1, the system surfaces specific recommendations with one-tap approval. "Your car insurance is $40/month higher than comparable coverage. Switch?" The user sees the recommendation, evaluates it, and taps yes or no.

Don't build batch actions. Force the user to tap "Yes" on every single item. No "optimize everything" buttons. Each decision is a trust transaction.

Level 3: Co-Pilot

The system acts within explicit boundaries. The user defines guardrails: "Auto-pay bills under $200. Alert me for anything larger." The system operates within those constraints—enforced by deterministic code, not prompt engineering. Pydantic validators on every output. Strict OAuth scopes that physically prevent the model from executing transactions over the limit, regardless of what it "wants" to do. The guardrails live in code that the LLM cannot override, and everything outside those boundaries gets escalated to a human.

This is where most AI products want to start. It's where they should arrive after months of earning trust at lower levels.

Level 3 also requires something most teams forget: a state machine tracking every action's lifecycle. At Level 1 and 2, there's no execution — nothing to track. But at Level 3, the agent proposes actions that may be pending user approval, approved and queued, executing, completed, or failed. Without explicit state transitions, you get race conditions: the user approves a transfer while the agent is already retrying a timed-out version of the same transfer. Two transactions hit the bank. The state machine enforces that each action moves through proposed → approved → executing → completed in exactly one direction, and any concurrent attempt to advance a state that's already moved forward gets rejected. Idempotency prevents double-spend on the execution side. The state machine prevents it on the approval side.

Level 4: Autopilot

The system acts autonomously with a kill switch. Full cross-domain optimization. The system rebalances savings, negotiates rates, optimizes timing across financial accounts. The user sets goals and reviews outcomes. Every action is logged, every decision is explainable, and the kill switch — a Redis key checked at the top of every execution loop — terminates the agent in milliseconds, not minutes.

I'm not sure Level 4 belongs in financial AI. Not yet. Maybe not for years. But if it does exist, it's something users graduate into after the system has proven itself thousands of times at lower levels.

Trust Escalation Protocol

Level 1: OBSERVER
  accounts:read
       │
       │  80%+ precision, discovery rate > 0
       │  99.9% data match, user opts in
       ▼
Level 2: ADVISOR
  accounts:read
  recommendations:write
       │
       │  70%+ acceptance rate
       │  3+ months at L2, user opts in
       ▼
Level 3: CO-PILOT
  accounts:read
  transactions:write
  ├─ $200 per-txn cap
  ├─ $1000/day velocity
  └─ Kill switch active
       │
       │  6+ months at L3, zero critical incidents
       │  User opts in
       ▼
Level 4: AUTOPILOT
  accounts:read
  transactions:write
  ├─ User-defined caps
  ├─ Cross-domain scope
  └─ Kill switch active

Demotion: Any level can drop to Level 1 instantly.
Promotion: Always requires explicit user consent.
Kill switch: Works at every level. Non-negotiable.

Why Most AI Products Skip to Level 4

Follow the money and you'll see why. VCs don't fund "watch and report." They fund "autonomous AI that manages your entire financial life." The demo that gets the term sheet is always Level 4. But the product that earns user trust starts at Level 1.

This creates a predictable failure pattern — the same one behind why 95% of AI pilots fail. The startup builds the autonomy first and the trust later. Users sign up for the magic, hit the first false positive, and uninstall. The gap between demo and production isn't technical. It's the distance between assumed trust and earned trust.

It runs deeper than that. "Auto-managed portfolios" look better in pitch decks than "read-only financial observer." Nobody got funded for building the boring foundation. But every fintech blowup traces back to the same root cause: the system had more authority than it had earned.

The Fintech Graveyard

Look at what happened to Mint. Yes, Intuit ultimately pulled the plug to push users toward Credit Karma. But Mint was a zombie long before the mercy kill. Twenty-five million users, and the business model still couldn't sustain itself because free financial data aggregation costs more than the referral fees it generates. The screen-scraping economics were fragile. The trust model was worse — users handed over bank passwords to a third party and hoped for the best. Mint never built independent trust; it borrowed Intuit's brand and never made the economics work on its own. That's not a business. That's a subsidy.

Then there's Plaid. The company that became the plumbing for half of fintech paid $58 million to settle a class action for harvesting more financial data than users authorized. Their login interface mimicked bank login screens to collect credentials directly. Millions of users who thought they were logging into their bank were actually handing credentials to a third party.

That's what happens when a system operates at Level 3 permissions while the user thinks they're granting Level 1. The trust mismatch is the vulnerability.

Post-Mortem: The Autonomous Expense Agent

I watched this pattern play out with a startup building an "autonomous expense management" agent. The pitch was compelling: connect your corporate cards, the AI categorizes expenses, flags anomalies, and auto-submits reimbursements. Level 4 from day one.

The agent worked beautifully in the demo. Clean receipts, standard categories, predictable amounts. Then it hit production. An employee's lunch receipt was partially obscured. The agent categorized a $847 team dinner as "office supplies," auto-submitted it, and the reimbursement cleared before anyone noticed. One misclassification isn't fatal. But the agent's confidence score was 0.91 — high enough to bypass the human review threshold they'd set at 0.85.

The real damage wasn't the $847. It was what happened next. Finance discovered the error during a quarterly audit, lost trust in the entire system, and demanded manual review of every transaction the agent had ever processed. Three months of "autonomous" expense management, manually re-audited by two accountants over six weeks. The time savings evaporated. The project was shelved.

Had they started at Level 1 — categorize and flag, but never submit — the misclassification would have been caught by the employee before submission. The agent would have learned from the correction. Trust would have grown. Instead, they built the autonomy first and discovered the trust gap at audit time. The pattern repeats everywhere I look.

How Trust Actually Gets Earned

The boring way. Slowly. There's no shortcut.

The Trust-Earning Protocol:

Start read-only. Connect to data sources with the minimum permissions possible. If you can use read-only API access, use it. If you need write access later, ask later.
Prove observation value first. Can you show the user something they didn't know? A subscription they forgot about. A pattern in their spending. An opportunity they're missing. If your AI can't find value by just watching, it definitely can't be trusted to act.
Make recommendations auditable. Every suggestion includes the data that generated it, the reasoning behind it, and the expected outcome. Users verify the logic, not just the conclusion.
Track your own accuracy. How often were your recommendations accepted? How often did the ones users followed actually work? If you're not measuring this, you're guessing.
Let the user set the pace. Some users will move to Level 3 in a month. Others will stay at Level 1 forever. Both are fine. The system adapts to the user, not the other way around.

Every permission must be individually revocable. Not "delete your account and start over." Individually. I can trust you with my grocery spending data but not my investment accounts. That granularity is the architecture of earned trust.

Shadow-Testing: Proving Trust Before Granting It

Before promoting an agent from Level 1 to Level 2, run the Level 2 logic in shadow mode. The agent generates recommendations as if it were Level 2, but nothing surfaces to the user. Instead, you log every recommendation and compare it to what the user actually did.

async def shadow_test_level2(
    observation: Observation,
    user_action: UserAction | None
) -> ShadowResult:
    """Run Level 2 recommendation logic against Level 1
    observations. Log mismatches — don't surface them."""
    recommendation = await generate_recommendation(observation)

    # Did the user independently do what we would have
    # suggested? That's a trust match.
    if user_action and recommendation.matches(user_action):
        return ShadowResult(
            match=True,
            confidence=recommendation.confidence
        )

    # The user did something different, or did nothing.
    # Log it — this is training data for promotion readiness.
    logger.info("shadow.trust_mismatch",
        observation_id=observation.id,
        recommendation=recommendation.action,
        user_did=user_action.action if user_action else "nothing",
        confidence=recommendation.confidence,
        delta=recommendation.estimated_savings)

    return ShadowResult(match=False, recommendation=recommendation)

After 30 days of shadow testing, you have a concrete dataset: how often did the agent's recommendations align with what the user chose independently? A 70%+ match rate means the agent understands the user's preferences well enough to suggest actions. Below 50%? The agent doesn't know this user yet. Keep it at Level 1.

The shadow log also reveals the regression delta — recommendations that would have made things worse. If the agent suggests switching to a "cheaper" service that adds latency, or consolidating accounts in a way that loses interest, that's a failed promotion test. Any recommendation that increases cost or complexity without a demonstrable 10x improvement in another dimension is a negative signal. Track the ratio of positive to negative recommendations. If it drops below 5:1, the agent isn't ready.

One startup I advised skipped this entirely. Built a beautiful AI-powered budgeting tool, asked for full bank access on signup, and couldn't figure out why their 30-day retention was 8%. Users didn't trust it enough to keep feeding it data. Competent product. Nonexistent trust architecture.

The Economics of Graduated Trust

There's a physics-like constraint underneath all of this. False positives and false negatives have asymmetric costs, and the asymmetry gets worse as autonomy increases.

Level 1 (Observer) false positiveUser ignores a notification

Level 2 (Advisor) false positiveUser rejects a bad recommendation

Level 3 (Co-Pilot) false positiveSystem blocks a legitimate transaction

Level 4 (Autopilot) false positiveSystem moves money incorrectly

Cost escalationAnnoyance → Friction → Disruption → Damage

At Level 1, mistakes are invisible. At Level 4, mistakes cost real money. The cost curve isn't linear. It's exponential. This is why graduated deployment isn't a nice-to-have. It's an economic necessity. You want the system making its mistakes at the level where mistakes are cheapest.

According to Fortune, only 1 in 5 companies has a mature governance model for autonomous AI agents. That means 80% of organizations deploying AI agents haven't thought through what happens when the agent makes a bad decision at the wrong autonomy level. As I've written about before, agents that fail don't stop working. They keep going, spending money and making decisions until someone notices.

When This Framework Breaks

I'm not pretending this is universal. The trust hierarchy works for systems touching sensitive personal data. It doesn't always apply to:

Fraud detection. You want the system to act instantly at Level 3 or 4 because the cost of false negatives (missed fraud) dramatically exceeds false positives (blocked transactions). Speed matters more than graduated trust.
Emergency systems. Medical AI that hesitates kills people. Some domains demand high autonomy from day one, with trust verified through regulation and certification rather than gradual user experience.
Low-stakes automation. Sorting emails into folders doesn't need a four-level trust hierarchy. The cost of a mistake is negligible. Match the framework's weight to the stakes.

The framework applies when the cost of autonomous mistakes is high and the cost of human verification is manageable. Financial data, healthcare decisions, legal analysis. Anywhere the system handles something you can't easily undo.

The Pattern: Implementing Level 3 Guardrails

Talking about "deterministic guardrails" is easy. Building them is where most teams fail. The critical insight: the guardrail wraps the tool execution, not the prompt. You don't ask the LLM to be safe. You make it physically impossible for the LLM to be unsafe.

Level 3 Guardrail Architecture

┌──────────┐     ┌──────────┐     ┌────────────────────┐     ┌──────────┐
│  User    │────▶│   LLM    │────▶│  Pydantic Parser   │────▶│  Action  │
│  Input   │     │ (thinks) │     │  (validates JSON)  │     │  (bank)  │
└──────────┘     └──────────┘     └────────┬───────────┘     └──────────┘
                                           │
                                    ╔══════╧══════╗
                                    ║  HARD LIMIT  ║
                                    ║  amount>$200 ║
                                    ║  → ESCALATE  ║
                                    ╚══════╤══════╝
                                           │
                                    ┌──────▼──────┐
                                    │   Human     │
                                    │   Review    │
                                    └─────────────┘

The LLM proposes. Deterministic code disposes.

Here's what Level 3 looks like in practice:

from pydantic import BaseModel, field_validator, ValidationError

# Custom exceptions trigger specific logging hooks.
# ValueError is generic — your ops team can't alert on it.
class TrustBoundaryViolation(Exception):
    def __init__(self, field: str, value, limit, context: dict):
        self.field = field
        self.value = value
        self.limit = limit
        self.context = context
        super().__init__(
            f"Trust boundary: {field}={value} exceeds {limit}"
        )

VAULT_CARDS = {"vault:", "reserve:", "restricted:"}

class TransferAction(BaseModel):
    amount: float
    destination: str
    reason: str
    payment_method_id: str = "default"

    @field_validator('amount')
    @classmethod
    def enforce_limit(cls, v):
        if v > 200.00:
            raise TrustBoundaryViolation(
                field="amount", value=v, limit=200.00,
                context={"level": 3, "action": "escalate"}
            )
        return v

    @field_validator('payment_method_id')
    @classmethod
    def block_vault_cards(cls, v):
        # Level 3 agents cannot touch reserve accounts.
        # This isn't configurable. It's architecture.
        if any(v.startswith(prefix) for prefix in VAULT_CARDS):
            raise TrustBoundaryViolation(
                field="payment_method_id", value=v,
                limit="non-vault methods only",
                context={"level": 3, "action": "reject"}
            )
        return v

def execute_agent_action(llm_output: str, confidence: float,
                         user_id: str, model_version: str):
    try:
        action = TransferAction.model_validate_json(llm_output)
    except (ValidationError, TrustBoundaryViolation) as e:
        logger.error("guardrail.validation_rejected",
            user_id=user_id, model=model_version,
            violation_type=type(e).__name__,
            raw_output=llm_output[:500], error=str(e))
        return escalate_to_human(e, llm_output)

    # Confidence gate: if the model isn't sure, a human decides
    if confidence < 0.95:
        logger.warning("guardrail.low_confidence",
            user_id=user_id, confidence=confidence,
            proposed_amount=action.amount)
        return escalate_to_human("Low confidence", llm_output)

    # Velocity gate: stop the bleed even if individual
    # transactions are small
    if daily_spend_total(action.destination) > 1000.00:
        logger.critical("guardrail.velocity_breach",
            user_id=user_id, destination=action.destination,
            daily_total=daily_spend_total(action.destination))
        return kill_switch_activate(
            reason="Daily spend velocity exceeded"
        )

    logger.info("agent.action_executed",
        user_id=user_id, amount=action.amount,
        destination=action.destination,
        payment_method=action.payment_method_id)
    return bank_api.execute(action)

The LLM can "want" to transfer $5,000 all day long. The validator doesn't care what the model wants. It checks a number against a threshold and raises an exception. No prompt injection, no jailbreak, no cleverly worded request gets past a float > 200.00 comparison.

Production hardening: Wrap bank_api.execute() in a circuit breaker (threshold=5 failures, recovery=60s). Hard timeout: 10 seconds on all bank API calls. Thread trace_id on both request and response for end-to-end audit.

The TrustGuard Decorator

Pydantic validates the data. But who validates the caller? A Level 2 agent shouldn't be calling execute_transfer at all — that function requires Level 3. The trust check needs to happen at the function boundary, not inside the prompt. Here's a decorator that enforces trust levels at runtime:

from enum import IntEnum
from functools import wraps

class TrustLevel(IntEnum):
    OBSERVER = 1   # Read-only
    ADVISOR = 2    # Recommend, never act
    COPILOT = 3    # Act within guardrails
    AUTOPILOT = 4  # Act autonomously

def requires_trust(level: TrustLevel):
    """Enforce trust level at the function boundary.
    This is not a suggestion. It's a gate. If the caller's
    trust level is below the requirement, the function
    never executes — no matter what the LLM asked for."""
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            # Trust level comes from the session token,
            # not from anything the LLM can influence.
            session = kwargs.get('session') or args[0]
            caller_level = TrustLevel(session.trust_level)

            if caller_level < level:
                logger.warning("trust.insufficient",
                    required=level.name,
                    actual=caller_level.name,
                    func=func.__name__,
                    user_id=session.user_id)
                raise TrustBoundaryViolation(
                    field="trust_level",
                    value=caller_level.name,
                    limit=level.name,
                    context={"func": func.__name__,
                             "demote": caller_level < TrustLevel.OBSERVER}
                )
            return await func(*args, **kwargs)
        # Stamp the requirement on the function for introspection
        wrapper._required_trust = level
        return wrapper
    return decorator

# Level 1: Anyone can call this
@requires_trust(TrustLevel.OBSERVER)
async def get_account_summary(session, account_id: str):
    return await bank_api.read_balance(account_id)

# Level 2: Can recommend, but execution is blocked
@requires_trust(TrustLevel.ADVISOR)
async def suggest_savings_transfer(session, analysis):
    return Recommendation(action="transfer", **analysis)

# Level 3: Can execute within Pydantic guardrails
@requires_trust(TrustLevel.COPILOT)
async def execute_transfer(session, action: TransferAction):
    return await bank_api.execute(action)

# Level 4: Can rebalance across accounts
@requires_trust(TrustLevel.AUTOPILOT)
async def rebalance_portfolio(session, strategy):
    return await bank_api.multi_execute(strategy)

The trust level comes from the session token — which is set by the authentication layer, not by anything the LLM produces. A Level 2 agent can call suggest_savings_transfer all day, but the moment it tries to call execute_transfer, the decorator rejects it before the function body runs. The LLM doesn't get to argue. It doesn't get an error message it can interpret and work around. The function simply never executes. The check lives in the Python runtime, not in a system prompt the model might ignore.

Production hardening: Cache session trust levels for <5 seconds to avoid per-call DB lookups. On demote_to_observer, broadcast a cache invalidation event — stale trust levels are security holes. Timeout on session retrieval: 2 seconds. Timeout = reject, not retry.

The kill switch follows the same principle — deterministic, external to the model, and fast:

import functools
from redis.exceptions import ConnectionError, TimeoutError

def check_kill_switch(func):
    @functools.wraps(func)
    async def wrapper(*args, **kwargs):
        try:
            # One Redis read. Sub-millisecond. Checked before
            # every agent action, not every prompt.
            is_active = await redis.get("agent:kill_switch")
            if is_active == "1":
                raise AgentHalted("Kill switch activated")
        except (ConnectionError, TimeoutError):
            # FAIL CLOSED → DEMOTE TO LEVEL 1.
            # Don't halt the entire system — demote it.
            # The agent can still observe and report.
            # It just can't move money until Redis recovers.
            logger.critical("guardrail.redis_unreachable",
                func=func.__name__)
            session = kwargs.get('session') or args[0]
            await demote_to_observer(session.user_id,
                reason="Redis unreachable — safety unverifiable")
            raise TrustBoundaryViolation(
                field="infrastructure",
                value="redis_down",
                limit="safety_check_required",
                context={"demoted_to": "OBSERVER",
                         "func": func.__name__})
        return await func(*args, **kwargs)
    return wrapper

async def demote_to_observer(user_id: str, reason: str):
    """Emergency demotion to Level 1. The agent can still
    read data and surface observations. It just can't act.
    This preserves user value during infrastructure failures
    instead of going completely dark."""
    await db.execute(
        "UPDATE agent_sessions SET trust_level = 1 "
        "WHERE user_id = ? AND trust_level > 1",
        (user_id,))
    logger.critical("trust.emergency_demotion",
        user_id=user_id, new_level="OBSERVER",
        reason=reason)

@check_kill_switch
async def process_transaction(session, action: TransferAction):
    return await bank_api.execute(action)

One key in Redis. One check before every action. The user taps "Stop" in the app, a flag flips, and every pending agent action halts before touching the bank API. No graceful shutdown negotiation. No asking the LLM to please stop. A boolean kills it.

The except (ConnectionError, TimeoutError) block is the line most teams miss. If Redis is down, your kill switch is down. The question is: what does the agent do when it can't confirm the switch is off? Most implementations default to "proceed" — fail open. That's how you get unauthorized transactions during an outage. Ours defaults to demote: the agent drops to Level 1, read-only. It can still show the user their account balances and flag anomalies. It just can't move a cent until the safety infrastructure recovers. The user sees degraded capability, not a blank screen. And no money moves without the kill switch standing guard.

Safe Mode: When Redis is partitioned or unreachable, the system disables writes globally — not "eventually," not "after the next health check." Every write-path function checks the kill switch before execution. If that check fails, the function rejects. The user sees a "Safe Mode" banner: "Automated actions are paused while we verify system integrity. Your data is still visible." No silent degradation. No hidden state where the agent thinks it can act but the kill switch can't stop it.

But there's a subtler problem. That await redis.get("agent:kill_switch") call is a network round-trip. On a healthy network, it's sub-millisecond. During a DDoS, a cloud partition, or a Redis failover? The await hangs. Your kill switch — the one piece of infrastructure that must work when everything else is failing — is blocked on the same saturated network that caused the emergency. The fix: don't check Redis on every action. Check local memory.

import asyncio, time, threading

class KillSwitchCache:
    """Local memory flag for kill switch state.
    Updated via Redis Pub/Sub — the check is a nanosecond
    memory read, not a millisecond network round-trip.

    The kill switch must work ESPECIALLY when the network
    is saturated. An O(1) memory read has no dependencies."""

    _killed = False
    _lock = threading.Lock()
    _last_reconcile = 0.0
    RECONCILE_INTERVAL = 5  # seconds

    @classmethod
    def is_killed(cls) -> bool:
        return cls._killed      # Nanoseconds. No I/O.

    @classmethod
    def activate(cls):
        with cls._lock:
            cls._killed = True

    @classmethod
    async def reconcile(cls):
        """Safety net: poll Redis every 5s in case a Pub/Sub
        message was dropped. Belt AND suspenders."""
        now = time.monotonic()
        if now - cls._last_reconcile < cls.RECONCILE_INTERVAL:
            return
        cls._last_reconcile = now
        try:
            val = await asyncio.wait_for(
                redis.get("agent:kill_switch"), timeout=1.0
            )
            if val == "1":
                cls.activate()
        except (ConnectionError, TimeoutError,
                asyncio.TimeoutError):
            # Can't reach Redis? Assume killed. Fail closed.
            cls.activate()

async def kill_switch_subscriber():
    """Background task: flip local flag the instant Redis
    publishes HALT. No polling delay. No network dependency
    on the hot path."""
    pubsub = redis.pubsub()
    await pubsub.subscribe("agent:control")
    async for message in pubsub.listen():
        if (message["type"] == "message"
                and message["data"] == b"HALT"):
            KillSwitchCache.activate()
            logger.critical("kill_switch.local_activated",
                source="pubsub")

Now the decorator checks memory, not Redis:

def check_kill_switch(func):
    @functools.wraps(func)
    async def wrapper(*args, **kwargs):
        # O(1) memory read. Works when the network is on fire.
        if KillSwitchCache.is_killed():
            raise AgentHalted("Kill switch activated")

        # Background reconciliation — catches dropped messages
        await KillSwitchCache.reconcile()

        return await func(*args, **kwargs)
    return wrapper

The pub/sub subscriber runs as a background task when the agent process starts. The moment kill_switch_activate publishes HALT, every agent process flips its local flag. The per-action check goes from ~0.2ms (Redis GET over the network) to effectively zero (a boolean read from process memory). The 5-second reconciliation poll is the safety net: if the pub/sub message was dropped during a network blip, the agent self-corrects within 5 seconds. And if even that poll fails? asyncio.TimeoutError after 1 second triggers activate() — fail closed, same as before.

The pattern: Pub/Sub for speed. Polling for reliability. Local memory for availability. No single failure mode can leave the kill switch broken.

Here's the activation side — unchanged, because it already publishes to the channel the subscriber is listening on:

async def kill_switch_activate(
    reason: str,
    user_id: str = None
) -> dict:
    """Halt all agent actions. Total time: ~2ms Redis
    round-trip + 0ms for pending checks (they poll)."""
    pipe = redis.pipeline()
    # 1. Flip the global switch
    pipe.set("agent:kill_switch", "1")
    # 2. Log the activation (expires in 30 days)
    pipe.set(
        f"kill_switch:log:{datetime.utcnow().isoformat()}",
        json.dumps({
            "reason": reason,
            "user_id": user_id,
            "timestamp": datetime.utcnow().isoformat()
        }),
        ex=2592000
    )
    # 3. Publish to all running agent processes
    pipe.publish("agent:control", "HALT")
    await pipe.execute()  # Single round-trip, ~2ms

    # Cancel all queued but unexecuted actions.
    # NEVER use KEYS in production — it blocks Redis
    # while scanning the entire keyspace. On a busy
    # instance, KEYS can stall every other operation
    # for seconds. SCAN is incremental: same result,
    # zero latency spikes.
    cancelled = 0
    async for key in redis.scan_iter(
        match="txn:lock:*", count=100
    ):
        await redis.delete(key)
        cancelled += 1

    return {"halted": True, "pending_cancelled": cancelled}

Three things happen in a single Redis pipeline: the switch flips, the event logs, and a pub/sub message fires to every connected agent process. Total latency is one network round-trip — around 2 milliseconds on a local Redis instance. Every agent checking check_kill_switch before its next action sees the flag immediately. Pending transaction locks get cleaned up. The user taps "Stop," and within the time it takes their finger to leave the screen, every agent in the system is dead.

The velocity gate deserves its own explanation. Individual transaction limits catch the obvious problems — nobody transfers $5,000 through a $200 guardrail. But a compromised or hallucinating model can drain an account $50 at a time, all day long. The velocity gate catches the slow bleed by tracking cumulative spending in a sliding window:

async def daily_spend_total(
    destination: str, user_tz: ZoneInfo
) -> float:
    """Sliding 24-hour spend total using Redis.
    Day boundary is the USER'S configured timezone,
    not UTC. A midnight-UTC reset means a 5pm Pacific
    surprise — your velocity gate resets mid-afternoon
    and the agent gets a fresh $1,000 allowance while
    the user is still awake and spending."""
    today = datetime.now(user_tz).date().isoformat()
    key = f"spend:{destination}:{today}"
    total = await redis.get(key)
    return float(total) if total else 0.0

async def record_spend(destination: str, amount: float):
    key = f"spend:{destination}:{date.today().isoformat()}"
    # INCRBYFLOAT is atomic — no race condition between
    # concurrent agent actions
    new_total = await redis.incrbyfloat(key, amount)
    # First transaction of the day sets the 24-hour expiry
    if new_total == amount:
        await redis.expire(key, 86400)
    return new_total

The key insight: Redis INCRBYFLOAT is atomic. Two agent actions hitting the same destination simultaneously won't produce a race condition. The EXPIRE call on first write means old keys clean themselves up — no background job, no stale data. If five $50 transactions land in one hour, the velocity gate triggers on the eleventh, long before the daily ceiling is breached. The model never sees this counter. It can't reset it, argue with it, or pretend it doesn't exist.

Production hardening: Monitor spend key counts per user — if any user accumulates >1,000 keys, investigate (possible key namespace leak). Set Redis maxmemory-policy to volatile-ttl so spend keys evict before permanent ones. Alert if any spend key survives past its 86,400s TTL.

There's one more failure mode nobody talks about: the double-spend. The kill switch fires, the transaction is "in flight," and a retry sends it again. In financial systems, idempotency isn't optional — it's the difference between "we caught it" and "we charged them twice."

async def execute_with_idempotency(
    action: TransferAction,
    idempotency_key: str
) -> TransactionResult:
    """Atomic check-and-set prevents double execution.
    If the kill switch fires mid-transaction, a retry
    with the same key returns the original result.

    Canonical key format:
      {plan_id}:step:{index}
      e.g. "plan_8f3a2b1c:step:0"

    Replay window: 24 hours (result_key TTL).
    After 24h, the key expires and the same logical
    action could re-execute — so plans must complete
    or be marked failed within that window."""
    lock_key = f"txn:lock:{idempotency_key}"
    result_key = f"txn:result:{idempotency_key}"

    # Already executed? Return cached result.
    cached = await redis.get(result_key)
    if cached:
        return TransactionResult.parse_raw(cached)

    # Atomic lock: only one execution per key
    acquired = await redis.set(
        lock_key, "processing",
        nx=True,   # Only set if key doesn't exist
        ex=300     # 5-minute timeout prevents orphaned locks
    )
    if not acquired:
        raise TransactionInFlight(idempotency_key)

    try:
        result = await bank_api.execute(action)
        # Cache result for 24 hours — retries get
        # the same answer without re-executing
        await redis.set(result_key, result.json(), ex=86400)
        return result
    except Exception:
        await redis.delete(lock_key)  # Release on failure
        raise

The nx=True flag is the entire trick. Redis SET NX is atomic — if two processes race to execute the same transaction, exactly one gets the lock. The loser gets TransactionInFlight and waits. The winner executes, caches the result, and every subsequent retry for the next 24 hours returns the cached result without touching the bank API again. No double-spend. No matter how many times the agent retries.

Production hardening: Alert on orphaned locks (txn:lock:* keys older than 5 minutes — the ex=300 TTL should handle this, but monitor anyway). Circuit-break the bank API after 3 consecutive 5xx responses. Log trace_id on lock acquisition and release for forensic reconstruction.

The Full Execution Loop

Here's where it all comes together. An agent doesn't execute one action — it runs a plan. Multiple steps, each touching real money. The question every engineer asks: what's the overhead of checking Redis on every single action in the loop?

async def run_agent_plan(
    plan: list[TransferAction],
    plan_id: str,
    user_id: str,
    trace_id: str   # Thread this through EVERY log line.
) -> list[TransactionResult]:
    """Execute a multi-step agent plan with per-action
    guardrails. Overhead budget per action:

      Kill switch check:  ~0ns   (local memory read)
      Velocity check:     ~0.2ms (Redis GET)
      Idempotency lock:   ~0.3ms (Redis SET NX)
      ─────────────────────────────────────────
      Total guardrails:   ~0.5ms
      Bank API call:      ~200ms
      Overhead ratio:     0.25%

    Required log fields (every line, no exceptions):
      trace_id, plan_id, user_id, step, action,
      latency_ms, outcome
    """
    results = []

    for i, action in enumerate(plan):
        step_ctx = {
            "trace_id": trace_id,
            "plan_id": plan_id,
            "user_id": user_id,
            "step": i,
            "total_steps": len(plan),
        }

        # ── Kill switch: ~0ns (local memory) ──────────
        # No network call. The pub/sub subscriber flips
        # this flag the instant HALT is published.
        # Reconciliation with Redis happens every 5s
        # in the background — not on the hot path.
        if KillSwitchCache.is_killed():
            logger.warning("agent.halted_mid_plan",
                **step_ctx,
                completed=i, remaining=len(plan) - i)
            break

        # ── Velocity gate: 0.2ms ────────────────────
        daily = await daily_spend_total(
            action.destination, user_tz=session.user_tz)
        if daily + action.amount > 1000.00:
            logger.critical("guardrail.velocity_breach",
                **step_ctx,
                daily_total=daily + action.amount)
            await kill_switch_activate(
                reason=f"Velocity: ${daily + action.amount:.2f}",
                user_id=user_id
            )
            break

        # ── Execute with idempotency: ~200ms ────────
        # The bank API is the bottleneck, not Redis.
        # Three Redis calls add 0.7ms to a 200ms operation.
        result = await execute_with_idempotency(
            action,
            idempotency_key=f"{plan_id}:step:{i}"
        )
        await record_spend(
            action.destination, action.amount)
        logger.info("agent.step_complete",
            **step_ctx,
            amount=action.amount,
            destination=action.destination)
        results.append(result)

    return results

The kill switch check is a local memory read — zero network dependency. The velocity and idempotency checks still hit Redis, but two Redis calls per action add ~0.5ms to a 200ms bank API call. The guardrails add 0.25% latency to each step — invisible to the user, but they're the reason the agent can't drain an account even if the LLM hallucinates a ten-step plan to move everything to an offshore account. The loop breaks on the first failed check. No subsequent actions execute. The overhead objection evaporates when you see the numbers.

Notice the trace_id threaded through every log line. Without it, "audit trail" means "we logged some strings." With it, a single query reconstructs the entire execution path: which plan, which user, which step failed, what the agent proposed, and what the guardrail rejected. When the compliance team asks "what happened at 3:47am on Tuesday," you hand them a trace ID, not a grep command.

OAuth scopes enforce the same boundaries at the API level. The LLM never sees the scope configuration. It can't request an upgrade. The permissions live in infrastructure the model can't reach. Here's what a Level 3 token actually looks like decoded:

{
  "sub": "user:8f3a2b1c",
  "agent_level": 3,
  "scope": [
    "accounts:read",
    "transactions:write"
  ],
  "constraints": {
    "max_per_transaction": 200.00,
    "max_daily_total": 1000.00,
    "allowed_destinations": [
      "savings:*",
      "bills:verified"
    ],
    "blocked_categories": [
      "crypto",
      "international_wire"
    ]
  },
  "iat": 1739145600,
  "exp": 1739149200,
  "iss": "trust-gateway"
}

The constraints object is the ceiling claim. It's not advisory — the API gateway rejects any request that exceeds these values before it reaches the banking backend. A Level 1 token has no transactions:write scope at all. A Level 2 token adds recommendations:write but still no transaction authority. The ceiling climbs only when the user explicitly grants it, and the token expires in one hour. Every escalation is re-earned.

Trust Level = Authorization, Not Confidence

People confuse these constantly, and the confusion is where security holes open. The model's confidence score — 0.91, 0.73, whatever — is a statistical artifact. It tells you how certain the model is about its own output. It says nothing about whether the model should be allowed to act. A hallucinating model can be 0.99 confident about a catastrophically wrong action.

Trust level is an authorization gate, not a quality metric. It's derived from server-side policy combined with explicit user grants. The auth service mints the token. The API gateway enforces the scopes. The LLM never sees the trust level, can't request an upgrade, and can't influence the policy that determines it. This isn't defense in depth — it's a hard architectural boundary between "what the model thinks" and "what the system allows."

Three rules, non-negotiable: agent_level is derived from server-side policy + user grants. Never from LLM output. Scopes are minted by the auth service. Service-to-service calls enforce them — not just the UI. Confidence gates operate within a trust level. They never promote across levels.

When Level 3 Fails

It will fail. The question is whether the failure builds trust or destroys it.

Level 3 Failure Cascade:

Validator rejects action. Pydantic raises ValidationError. The bank API never sees the request. Log the attempt with full context—what the model proposed, what limit it exceeded, and why the validator caught it.
Confidence drops below threshold. The model isn't sure. Escalate to human review: show what the model proposed, flag the uncertainty, and let the user decide. Never execute uncertain financial actions silently.
Velocity limit triggered. Kill switch activates. All pending actions halt within milliseconds. Push notification to user: "We stopped automated transactions because daily spending exceeded your $1,000 limit."
User reviews and decides. Two paths: raise the limit (explicit trust upgrade) or revoke Level 3 entirely (trust demotion back to Level 2). Both are valid. Both are the user's choice.

Every failure is a trust transaction. Handle it transparently and the user promotes you. Handle it poorly—hide the error, retry silently, or fail to explain—and you're back to Level 1. Permanently.

AI Trust Readiness Scorecard

Score your AI system to determine what autonomy level it has earned. Be honest — the cost of overestimating trust is measured in user churn and regulatory risk.

Dimension	Not Ready (0)	Partially Ready (1)	Ready (2)
Data access is read-only (no write/delete permissions)	Write/delete access granted	Mixed read-write	Strictly read-only
Every recommendation includes the data and reasoning behind it	Black-box suggestions	Partial explanations	Full data + reasoning shown
Acceptance rate of recommendations is tracked and >60%	Not tracked	Tracked but below 60%	Tracked and above 60%
False positive rate is measured and documented	Not measured	Measured but not documented	Measured and documented
Every permission is individually revocable by the user	All-or-nothing access	Some permissions revocable	Every permission individually revocable
System has a sub-second kill switch (Redis key, not deploy)	No kill switch	Kill switch exists but slow	Sub-second kill switch
All actions are logged with full audit trail	Minimal logging	Actions logged without context	Full audit trail with reasoning

The Bottom Line

Stop building demos. Build the kill switch first. If you can't turn your AI off in under a second, you have no business turning it on.

Go audit your permissions today. If your system has write access to anything it hasn't earned through months of accurate read-only observation, revoke it. Start at Level 1. Prove you can watch before you advise. Prove you can advise before you act. Prove you can act within deterministic guardrails before you ask for autonomy.

The hardest problem in AI-powered financial services isn't intelligence. It's trust. And trust doesn't scale with a feature release. It scales with time, accuracy, and the discipline to let users promote the system on their own terms.

"Stop building demos. Build the kill switch first. If you can't turn your AI off in under a second, you have no business turning it on."

Sources

Need an AI Trust Architecture Review?

I'll evaluate your AI system's permission model and map a graduated trust deployment plan.

Book Trust Review

Cisco Caceres • Coding since the late 1970s. Still learning, still building.