ChromaDB will run out of RAM before you think, and v0.5.x silently orphaned your embeddings

Published: 2026-03-29 Last verified: 2026-03-29 medium

Published Fact-checked 2026-03-29 · 0 corrections

ChromaDB will run out of RAM before you think, and v0.5.x silently orphaned your embeddings

From Theory Delta | Methodology | Published 2026-03-29

Note: ChromaDB is under active development (current version v1.5.6, April 2026). Historical claims about v0.5.x series behavior are documented as such. Re-verify on major releases, especially v1.x Rust rewrite changes.

What you expect

ChromaDB is the most-starred open-source vector database, commonly recommended as the default starting point for AI application development. The documentation presents automatic memory management via LRU cache and does not prominently surface single-node constraints or scaling limits.

What actually happens

The RAM ceiling is architectural, not configurable. ChromaDB is single-node and embedded-first. Database size is fundamentally limited by available system RAM — there is no horizontal scaling path. The capacity formula (derived from ChromaDB cookbook resource requirements): payload_bytes = vectors × dimensions × 4 bytes. At 1024 dimensions, 1 GB supports approximately 244,000 vectors. A 32 GB machine holds roughly 8 million vectors. The open GitHub issue #1323 tracks this limitation; it has been open without a scaling roadmap.

The LRU cache described in documentation did not prevent OOM in Docker deployments. Issue #1908 documents a connection leak that caused monotonic memory growth until container OOM. Users required Docker restarts every 2–3 days. The fix landed in approximately v0.5.6 via PR #2014. Teams on v0.5.6+ are not affected.

Data was silently lost in multi-collection deployments on v0.5.7–v0.5.12. Issue #2922 documents a correctness bug in the log purge mechanism. During log purge in multi-collection scenarios, embeddings were deleted while metadata survived — producing collections with orphaned documents. Queries returned metadata but could not perform similarity search on affected entries. No error was raised during the purge. The fix landed in v0.5.13 via PR #2923. LangChain and LlamaIndex deprecated affected Chroma versions as a result.

Embedding dimension is locked on first insert with no migration path. Once vectors are inserted into a collection, the embedding dimension is immutable. Attempting to insert embeddings with a different dimension raises an HTTP 500 error with no automatic migration path (issue #945). To change embedding models, teams must recreate entire collections. There is no public API for inspecting or migrating dimensions.

Batch size is capped at 41,666 embeddings by an SQLite constraint. ChromaDB enforces a maximum of 41,666 embeddings per add() call (issue #1049, closed). Large ingestion pipelines must batch manually. Use the max_batch_size attribute to confirm the limit in your environment.

Metadata filtering degrades to minutes at moderate scale. ChromaDB has no indices on metadata fields — every filtered query performs a full table scan. At 40,000 documents, a single-filter metadata search was reported at approximately 5 minutes response time (issue #1394).

Pydantic v1 compatibility was removed in v1.5.3+. Code using pydantic.v1 fallback import patterns is incompatible with Chroma >= v1.5.3. Projects must either pin Chroma below v1.5.3 or complete migration to native Pydantic v2 APIs before upgrading.

What this means for you

Your prototype will work. You scale it. Then you hit one of these walls:

You add a second or third collection and the container starts OOMing — if you are on a pre-v0.5.6 version.
You upgrade through the v0.5.7–v0.5.12 range and your multi-collection deployments silently lose vector data while metadata survives. Your queries return results. Those results have no embeddings backing them. You won’t notice until you investigate why similarity scores look wrong.
You hit the RAM ceiling at a fraction of your expected production scale and discover there is no horizontal scaling path.
You try to switch embedding models and discover your entire collection must be recreated.

The metadata filtering cliff is especially dangerous for agents: an agentic loop doing filtered retrieval over a growing knowledge base is heading toward 5-minute query times with no warning — just gradually worsening latency until the agent times out.

What to do

For new projects: Evaluate ChromaDB for prototyping only. If your production knowledge base will exceed 5–10M vectors, or if you need metadata filtering at scale, evaluate Qdrant (horizontal scaling, payload indices) or Pinecone from the start. Migrating after the fact is painful — the embedding dimension lock means recreating collections.

If you are on v0.5.7–v0.5.12: Upgrade to v0.5.13+ immediately. Audit existing collections for data integrity: query all collection IDs and cross-reference metadata count vs vector count. Discrepancies indicate the data loss bug affected your deployment.

For ingestion: Batch your add() calls at or below 41,666 embeddings. Use collection.max_batch_size to confirm the limit in your environment.

For metadata filtering: At scale, avoid ChromaDB metadata filters for latency-sensitive queries. Consider pre-filtering document IDs in a separate metadata store (PostgreSQL, SQLite) and passing them as ids to ChromaDB get() or query() calls.

For memory management: In Docker, set explicit memory limits and monitor container memory. Do not assume LRU eviction will prevent OOM. If running v0.5.x, upgrade to v0.5.6+ to get the connection leak fix.

Falsification criterion: Benchmarks showing ChromaDB v1.5.x operating stably above 100M vectors on a single node, or horizontal scaling capabilities added to the v1.x architecture, would disprove the scaling ceiling claim; evidence that the LRU cache reliably evicts before OOM in v1.x Docker deployments would disprove the memory claim.

Evidence

Tool	Version	Result
ChromaDB	v0.5.7–v0.5.12	source-reviewed: embeddings silently lost during multi-collection log purge while metadata survives (#2922, fixed v0.5.13)
ChromaDB	v0.4.x Docker	source-reviewed: connection leak → monotonic OOM; Docker restart every 2–3 days (#1908, fixed ~v0.5.6)
ChromaDB	v1.5.5 (current)	source-reviewed: single-node RAM-bound architecture, no horizontal scaling (#1323, open); Pydantic v1 dropped in v1.5.3+
ChromaDB	v0.4.x+	source-reviewed: 41,666 embedding batch limit (SQLite constraint) (#1049, closed)
ChromaDB	Unspecified	source-reviewed: metadata filtering 5-min at 40K docs (#1394)

Confidence: medium — all claims are source-reviewed from GitHub issues, not tested by execution in Theory Delta’s environment. The data loss bug (#2922) and memory leak (#1908) are independently confirmed by third-party issue reporters, LangChain/LlamaIndex deprecation announcements, and fix PRs. The RAM ceiling claim is architectural (confirmed via cookbook resource docs and the open issue #1323).

Strongest case against: ChromaDB’s v1.x Rust rewrite may have addressed the memory management issues at the architectural level, not just patched the connection leak. The data loss bug is fixed. The current version (v1.5.6) is a substantially different codebase from the v0.5.x series where most failures were observed.

Open questions: Does the Rust rewrite (v1.x) fix the LRU cache reliability issue, or only improve query speed? What is the first-query cold-start latency penalty for HNSW index loading in v1.x? Does v0.5.13+ fully close the data loss path in multi-collection deployments, or are edge cases remaining?

Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.