Error suppression patterns common in scripts will silently break your agent
Error suppression patterns common in scripts will silently break your agent
From Theory Delta | Methodology | Published 2026-03-29
What you expect
Shell scripting guides treat 2>/dev/null, || true, and set +e as style preferences. Python tutorials present broad except: blocks as acceptable shortcuts. Both treat error suppression as a minor code smell — the assumption is that the person running the script will notice if something looks wrong.
What actually happens
Agents have exactly one feedback channel: exit code plus stdout/stderr. When error suppression destroys this channel, the agent cannot detect, diagnose, or recover from failures.
The “Tools Fail” paper (arXiv:2406.19228) measures the impact quantitatively: when tools returned incorrect output without error signals, GPT-3.5 accuracy dropped from 98.7% to 22.7% — a 76 percentage point collapse. The failure mode is not “agent notices something is wrong but cannot fix it.” It is “agent does not notice anything is wrong” — models copy incorrect tool output and generate hallucinated justifications for why it was correct.
This pattern appears across the full agent toolchain, not just bash scripts:
SDK layer: Vercel AI SDK streamText() silently swallows errors by default — no logs, no stack trace without an opt-in onError callback. Fixed in v4.1.22, but the default behavior shipped silently for all prior versions.
CLI layer: OpenAI Codex CLI issue #9091 documents a regression between v0.36 and v0.72 where all commands returned exit code 0 with empty stdout and stderr. The agent could not observe any command output and could not detect whether commands succeeded or failed. Root cause was local shell environment configuration — but the failure mode (stdout invisible to agent) is the same regardless of cause. Related issues #7317, #6033, and #6991 confirm this is a recurring failure class.
Shell layer: Bash 3.2 (macOS default) silently crashes agent scripts on empty array expansion under set -u — ${arr[@]} expands to nothing and the script terminates with an unbound variable error. Bash 3.2 also does not support declare -A associative arrays — scripts using them return exit code 0 with no output rather than surfacing the unsupported syntax error.
Observability layer: AgentOps silent 401 authentication failures occur without visible error output by default. An agent using AgentOps for observability may be operating without any telemetry while believing instrumentation is active.
CI layer: SFEIR Institute documentation on Claude Code headless mode reports approximately 8% of CI/CD executions produce a silent failure: exit code 0 returned, but response is empty, incomplete, or irrelevant.
Note: the observatory prevalence claim (58% of repos exhibit error suppression) is sourced from an internal scan tagged as [legacy, unverified] in the source block and should not be treated as a confirmed statistic until independently reproduced.
What this means for you
Your agent pipeline is not just vulnerable to error suppression you introduce — it is vulnerable to the error suppression already in every shell script, every SDK, and every tool your agent calls. The difference between human and agent execution is not degree but kind:
| Property | Human-executed script | Agent-executed script |
|---|---|---|
| Visual feedback | Terminal output visible | None — only structured return value |
| Error detection | Intuition, domain knowledge | Exit code + stderr only |
| Recovery path | Re-run, inspect logs, ask a colleague | Retry same command, or hallucinate success |
| Suppression impact | Inconvenient — human will eventually notice | Catastrophic — agent proceeds on false premise |
The compounding factor: Dagster’s “Dignified Python” analysis found that LLMs default to broad exception swallowing because try: ... except: pass patterns dominate training data. Agents writing scripts actively introduce error suppression unless their context explicitly prohibits it. You are not just defending against legacy patterns — you are defending against your own agent.
Your staging environment (low concurrency, single platform) will pass. Production (concurrent requests, mixed platforms, real-world tool failures) is where error suppression triggers the 76-point drop.
What to do
Add set -euo pipefail as the first line of every agent-executed bash script. This converts the largest class of silent failures into explicit errors that agents can detect and act on. The pipefail flag is especially important: without it, failing_command | grep pattern returns the exit code of grep, not the failing command.
#!/bin/bash
set -euo pipefail
For commands where non-zero exit is not an error (e.g., grep returning 1 for “no matches”), use targeted handling with logging rather than blanket suppression:
# BAD: suppresses all error information
command 2>/dev/null || true
# GOOD: capture and surface errors explicitly
if ! output=$(command 2>&1); then
echo "ERROR: command failed with: $output" >&2
exit 1
fi
# Acceptable: grep no-match is not an error, but document why
matches=$(grep -r "pattern" dir/ 2>&1) || true # grep returns 1 on no match
For Python code agents execute or write:
# BAD: swallows all errors
try:
result = risky_operation()
except:
pass
# GOOD: catch specific exceptions, surface the error
try:
result = risky_operation()
except SpecificError as e:
print(f"ERROR: {e}", file=sys.stderr)
raise
When agents write scripts, include an explicit rule in their context: “Never use 2>/dev/null, || true, or bare except: blocks. Use set -euo pipefail. Catch specific exceptions and re-raise or log to stderr.”
For the Tools Fail paper’s mitigation: adding a simple disclaimer (“The tool can sometimes give incorrect answers”) to tool descriptions improved silent error detection by approximately 30% (GPT-3.5: from 23% to 53% in zero-shot). This is low-cost and can be added to tool definitions without changing the tool implementation.
This finding would be disproved by: a controlled study showing agent task accuracy is not significantly degraded by silent tool failures in real-world (non-synthetic) codebases, or by agent architectures that achieve >80% accuracy on tasks where tool failures are silently suppressed, at a scale comparable to the Tools Fail paper.
Evidence
| Claim | Primary Source | Verified |
|---|---|---|
| 76pp accuracy drop with silent tool failures | arXiv:2406.19228 | Yes — controlled study, GPT-3.5 |
| ~8% of Claude Code CI/CD runs return exit 0 with empty/irrelevant response | SFEIR Institute | Source-reviewed |
| Codex CLI all commands returned exit 0 with empty stdout/stderr | openai/codex#9091 | Source-reviewed; root cause: local shell env config |
| Vercel AI SDK streamText() swallows errors silently by default | vercel/ai#4726 | Source-reviewed; fixed in v4.1.22 |
| Adding disclaimer to tool description improves error detection ~30% | arXiv:2406.19228 | Yes — same study |
Confidence: medium — primary quantitative evidence from one academic paper (arXiv:2406.19228), independently confirmed by GitHub issues in three separate tools. Not tested by execution in Theory Delta’s environment. The Tools Fail paper uses synthetic calculator tools, not real-world bash scripts; the magnitude of accuracy degradation in production agentic systems may differ.
Open questions: Does the accuracy degradation replicate on more capable models (Claude 3.5+, GPT-4o)? Does set -euo pipefail actually improve end-task accuracy in agent-executed scripts, or just surface more visible errors? What is the real-world prevalence of error suppression in agent-executed scripts (the internal observatory stat is unverified)?
Seen different? Contribute your evidence — theory delta is what makes this knowledge base work.