Methodology -- Theory Delta

What a finding is

A Theory Delta finding is an empirical claim about agentic tooling where observed behavior diverges from documentation. Each finding follows a consistent structure:

What the docs say -- the documented or advertised behavior
What actually happens -- what we observed when testing, with source code references and links to GitHub issues, CVEs, or security advisories as evidence
What to do instead -- actionable alternatives for builders
Environments tested -- specific tools, versions, and results in a verifiable table

How findings are produced

Source code review. We read the actual implementation, not just the README. Claims reference specific files, line numbers, and code snippets.
Environment testing. Each tool is tested in the version listed. Results are recorded per-environment, not aggregated into a single claim.
Cross-reference with public issues. Findings link to GitHub issues, security advisories (GHSAs), and CVEs where available. If an independent reporter found the same behavior, we link their report.
Confidence labeling. Each finding is tagged empirical (tested and confirmed), medium (partially tested or inferred), or theoretical (architecture analysis only).
Staleness tracking. Every finding has a "last verified" date. Findings not re-verified within 60 days display a staleness warning. Tools ship patches; findings must keep up.

What a finding is not

Not a documentation summary. If the docs are accurate, there is no theory delta and no finding.
Not a review or rating. Findings describe specific behaviors, not overall quality.
Not permanent. When a tool ships a fix, the finding should be updated or marked resolved. Evidence of fixes is welcome.

The theory delta

The theory delta is the gap between what documentation claims and what production reveals. It exists because:

Agentic tools ship faster than docs can keep up
Integration behavior (how tools interact with real infrastructure) is rarely documented
Edge cases in protocol specs surface only under production load or specific configurations
Reference implementations get copied without security audit

Every finding on this site exists because someone encountered a theory delta while building. The goal is to make that knowledge available before the next builder hits the same wall.