Enabling streaming in the OpenAI Agents SDK means your guardrails no longer block anything
Enabling streaming in the OpenAI Agents SDK means your guardrails no longer block anything
From Theory Delta | Methodology | Published 2026-03-17
What you expect
The OpenAI Agents SDK provides a guardrails system to define safety checks on agent inputs and outputs. You enable streaming for latency. You enable guardrails for safety. Both are documented features that work together.
What actually happens
Guardrails and streaming are architecturally incompatible. When you use Runner.run_streamed(), content is delivered to the user as it is generated. Guardrails run as a parallel check — but they complete after content has already been streamed. By the time a guardrail trips and raises InputGuardrailTripwireTriggered, the user has already seen the content it was supposed to block.
This is not a bug. OpenAI closed Issue #300 as NOT_PLANNED — the streaming architecture cannot support pre-delivery content filtering without buffering the entire response, which defeats the purpose of streaming. The incompatibility is by design and will not be fixed.
Mixed-model handoff pipelines crash in a separate, unrelated way. When a reasoning model (o1, o3, or GPT-5) hands off to a non-reasoning agent, the reasoning model’s internal items (chain-of-thought traces with rs_ ID prefixes) are passed in the conversation context. Non-reasoning agents cannot process these items and crash with “Item with id rs_ not found.” The SDK does not sanitize reasoning items at handoff boundaries. This makes heterogeneous agent pipelines — where you want a reasoning model for planning and a faster model for execution — unreliable without manual context sanitization.
What this means for you
If you are using Runner.run_streamed() with guardrails enabled, your guardrails are not blocking content — they are reporting after the fact. For safety requirements, this is the same as having no guardrails. For compliance requirements, this is worse: you have the appearance of content filtering without the guarantee.
The two failure modes combine to constrain your SDK configuration:
- If you want guardrails, you cannot stream.
- If you want streaming, your guardrails are decorative.
- If you want mixed-model pipelines, you need to manually strip reasoning items between handoffs.
There is no middle path within the SDK. OpenAI’s stated answer is: run the guardrail check serially before calling the streaming runner, or use non-streaming execution and accept the latency cost.
What to do
-
Do not rely on SDK guardrails in streaming mode. If content safety is a requirement, use
Runner.run()(non-streaming) and accept the latency cost. The guardrail guarantee holds in non-streaming mode. -
Implement guardrails at the transport layer as an alternative to accepting latency: filter content after the SDK but before your UI renders it. This adds latency but preserves the safety guarantee without requiring non-streaming execution throughout your pipeline.
-
For mixed-model pipelines, add explicit context sanitization between handoffs. Before passing conversation history from a reasoning model to a non-reasoning agent, strip any items with
rs_ID prefixes. -
Test your guardrails in the exact execution mode you use in production. A guardrail that blocks content correctly in non-streaming mode will be completely ineffective in streaming mode. If you have not explicitly tested the streaming path, you do not know whether guardrails are blocking anything.
This finding would be disproved by demonstrating that the OpenAI Agents SDK can enforce guardrails on streamed content before it reaches the user, or by OpenAI removing the NOT_PLANNED label and shipping a fix.
Evidence
| Tool | Version | Result |
|---|---|---|
| openai-agents-python | v0.11.1 | source-reviewed: guardrails execute after streaming; NOT_PLANNED confirmed (Issue #300, closed NOT_PLANNED) |
| openai-agents-python | v0.11.1 | source-reviewed: reasoning model items (rs_ prefix) crash non-reasoning downstream agents at handoff (Issues #1397, #1660, #569) |
Confidence: medium — the streaming/guardrail incompatibility is confirmed through source code review and the NOT_PLANNED label on the GitHub issue. The mixed-model handoff crash is confirmed through issue reports. No runtime reproduction was performed.
Open questions: Will OpenAI add a buffered-streaming mode that supports guardrails? Are there community workarounds for the mixed-model handoff issue beyond manual context stripping?
Seen different? Contribute your evidence — theory delta is what makes this knowledge base work.