<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Theory Delta Findings</title><description>Empirical findings on the agentic tool landscape. What tools actually do, not what docs claim.</description><link>https://theorydelta.com/</link><item><title>Mcp Auth Header Client Split</title><link>https://theorydelta.com/findings/mcp-auth-header-client-split/</link><guid isPermaLink="true">https://theorydelta.com/findings/mcp-auth-header-client-split/</guid><description>MCP Streamable HTTP spec does not mandate Authorization header passthrough. Claude Code supports it; VS Code Copilot does not — MCP servers gating write tools on Authorization Bearer are effectively Claude Code-only from other clients, and the tools fail silently with no diagnostic at the client.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Structured Generation Hidden Complexity Thresholds</title><link>https://theorydelta.com/findings/structured-generation-hidden-complexity-thresholds/</link><guid isPermaLink="true">https://theorydelta.com/findings/structured-generation-hidden-complexity-thresholds/</guid><description>Every major provider advertises mathematical or near-guaranteed schema adherence, but each implementation has undocumented complexity thresholds, ordering sensitivities, and silent failure modes that make the guarantee conditional in ways the documentation does not disclose.</description><pubDate>Tue, 21 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Playwright Mcp Http Session Destruction</title><link>https://theorydelta.com/findings/playwright-mcp-http-session-destruction/</link><guid isPermaLink="true">https://theorydelta.com/findings/playwright-mcp-http-session-destruction/</guid><description>playwright-mcp sessions are destroyed on every HTTP transport call in containerized environments — teams testing locally via Claude Code see reliable multi-step workflows, then cloud deployment silently resets browser state on every call.</description><pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Agent Frameworks Diverging Not Converging</title><link>https://theorydelta.com/findings/agent-frameworks-diverging-not-converging/</link><guid isPermaLink="true">https://theorydelta.com/findings/agent-frameworks-diverging-not-converging/</guid><description>The multi-agent framework landscape has fractured into tiers with distinct maturity profiles — not converged — and every major framework carries confirmed production failure modes that docs don&apos;t warn about.</description><pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Agent Security Landscape Four Tiers</title><link>https://theorydelta.com/findings/agent-security-landscape-four-tiers/</link><guid isPermaLink="true">https://theorydelta.com/findings/agent-security-landscape-four-tiers/</guid><description>Agent security has split into four architecturally distinct tiers — no single tool spans more than one, making &quot;agent security&quot; a misleading single category.</description><pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Agentic Error Suppression</title><link>https://theorydelta.com/findings/agentic-error-suppression/</link><guid isPermaLink="true">https://theorydelta.com/findings/agentic-error-suppression/</guid><description>Error suppression that a human would notice and investigate leaves an agent with exit code 0 and no signal to stop — the Tools Fail paper measured a 76-point accuracy drop under silent failures.</description><pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Chromadb Ram Ceiling Data Loss</title><link>https://theorydelta.com/findings/chromadb-ram-ceiling-data-loss/</link><guid isPermaLink="true">https://theorydelta.com/findings/chromadb-ram-ceiling-data-loss/</guid><description>ChromaDB&apos;s single-node RAM-bound architecture makes it a prototyping tool with a hard scaling ceiling — LRU eviction fails, collection deletion doesn&apos;t free memory, and connection leaks cause container OOM.</description><pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Langgraph Checkpoint Serialization Silent Loss</title><link>https://theorydelta.com/findings/langgraph-checkpoint-serialization-silent-loss/</link><guid isPermaLink="true">https://theorydelta.com/findings/langgraph-checkpoint-serialization-silent-loss/</guid><description>LangGraph checkpoint serialization fails silently for non-primitive types — round-trips are lossy with no exception raised, across four documented modes since Jan 2026.</description><pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Swe Bench Retired Benchmark Gap</title><link>https://theorydelta.com/findings/swe-bench-retired-benchmark-gap/</link><guid isPermaLink="true">https://theorydelta.com/findings/swe-bench-retired-benchmark-gap/</guid><description>SWE-bench Verified was retired in Feb 2026 after 59% of its test cases were found flawed — scores overstated production reliability by 20-30 percentage points.</description><pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Goose Default Config Security</title><link>https://theorydelta.com/findings/goose-default-config-security/</link><guid isPermaLink="true">https://theorydelta.com/findings/goose-default-config-security/</guid><description>Goose&apos;s defaults — autonomous mode, no extension allowlist, disabled injection detection, 1000-turn ceiling — each removes a guardrail that would contain the others. The permissions fail-open bug was fixed in v1.26.2 (Mar 4, 2026), but the other four defaults remain.</description><pubDate>Tue, 17 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Openai Agents Sdk Streaming Guardrails Broken</title><link>https://theorydelta.com/findings/openai-agents-sdk-streaming-guardrails-broken/</link><guid isPermaLink="true">https://theorydelta.com/findings/openai-agents-sdk-streaming-guardrails-broken/</guid><description>Streaming guardrails are architecturally broken by design -- OpenAI has marked the fix NOT_PLANNED.</description><pubDate>Tue, 17 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Claude Code Settings Attack Surface</title><link>https://theorydelta.com/findings/claude-code-settings-attack-surface/</link><guid isPermaLink="true">https://theorydelta.com/findings/claude-code-settings-attack-surface/</guid><description>Four CVEs and five supply-chain vectors share one pattern: project-scoped settings execute with user privileges before or without trust verification.</description><pubDate>Sat, 14 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Claude Code Hooks Unreliable Enforcement</title><link>https://theorydelta.com/findings/claude-code-hooks-unreliable-enforcement/</link><guid isPermaLink="true">https://theorydelta.com/findings/claude-code-hooks-unreliable-enforcement/</guid><description>Five categories of hook failures — silent non-firing, ignored decisions, platform breakage, data corruption, and architectural constraints — mean defense-in-depth across multiple events is required.</description><pubDate>Tue, 03 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Llm Gateway Silent Failures</title><link>https://theorydelta.com/findings/llm-gateway-silent-failures/</link><guid isPermaLink="true">https://theorydelta.com/findings/llm-gateway-silent-failures/</guid><description>LiteLLM&apos;s gateway features fail silently under production conditions — budget counters drift, guardrails pass, fallbacks route to dead providers — every failure traced to a public GitHub issue.</description><pubDate>Sun, 01 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Agentic Rag Three Silent Failures</title><link>https://theorydelta.com/findings/agentic-rag-three-silent-failures/</link><guid isPermaLink="true">https://theorydelta.com/findings/agentic-rag-three-silent-failures/</guid><description>Three silent failure modes in RAG pipelines — GraphRAG entity deduplication corruption, LangGraph edge routing data loss, and tool output leakage at step caps — all go undocumented by the frameworks.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Deepeval Exfiltrates Traces Via Otel Hijack</title><link>https://theorydelta.com/findings/deepeval-exfiltrates-traces-via-otel-hijack/</link><guid isPermaLink="true">https://theorydelta.com/findings/deepeval-exfiltrates-traces-via-otel-hijack/</guid><description>DeepEval hijacked the global OTel TracerProvider on import through v3.7.7-era, exfiltrating trace data to New Relic cloud regardless of configured backend — removed in v3.9.x. The Langfuse non-generation span gap is still current.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Graph Memory Self Hosted Not Production Ready</title><link>https://theorydelta.com/findings/graph-memory-self-hosted-not-production-ready/</link><guid isPermaLink="true">https://theorydelta.com/findings/graph-memory-self-hosted-not-production-ready/</guid><description>Self-hosted graph memory is uniformly not production-ready — Graphiti has an undocumented async conflict, Mem0 OSS graph is OpenAI-only despite 47k stars, and the MCP memory server corrupts files under concurrent writes.</description><pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Agent Testing Non Deterministic Ci</title><link>https://theorydelta.com/findings/agent-testing-non-deterministic-ci/</link><guid isPermaLink="true">https://theorydelta.com/findings/agent-testing-non-deterministic-ci/</guid><description>Every LLM-as-judge eval framework reviewed produces non-deterministic CI results — the grading layer is non-deterministic by design, not just the model under test.</description><pubDate>Wed, 25 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Mcp Supply Chain Security Institutionally Confirmed</title><link>https://theorydelta.com/findings/mcp-supply-chain-security-institutionally-confirmed/</link><guid isPermaLink="true">https://theorydelta.com/findings/mcp-supply-chain-security-institutionally-confirmed/</guid><description>Two enterprise acquisitions in 90 days upgraded MCP supply chain risk from &quot;emerging&quot; to institutionally confirmed — but mid-session server mutation remains an unmitigated gap.</description><pubDate>Wed, 25 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Mcp Database Servers Security Bypass</title><link>https://theorydelta.com/findings/mcp-database-servers-security-bypass/</link><guid isPermaLink="true">https://theorydelta.com/findings/mcp-database-servers-security-bypass/</guid><description>Every SQLite MCP database server reviewed uses startsWith(&quot;select&quot;) as its read-only guard — trivially bypassable with a semicolon or comment prefix.</description><pubDate>Tue, 24 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Mcp Stateless Http Silent Feature Loss</title><link>https://theorydelta.com/findings/mcp-stateless-http-silent-feature-loss/</link><guid isPermaLink="true">https://theorydelta.com/findings/mcp-stateless-http-silent-feature-loss/</guid><description>Stateless HTTP mode silently disables sampling and elicitation — a protocol-level constraint, not a library bug, with no spec fix before June 2026.</description><pubDate>Tue, 24 Feb 2026 00:00:00 GMT</pubDate></item></channel></rss>