Architecture Overview
codexfi is an OpenCode plugin that runs entirely in-process. There are no servers to run, no Docker containers, no infrastructure to manage — just two API keys (Voyage AI for embedding, one LLM provider for extraction).
System diagram
Plugin hooks
codexfi registers five hooks with the OpenCode plugin system:
| Hook | Purpose |
|---|---|
experimental.chat.messages.transform | Caches recent messages for extraction |
experimental.chat.system.transform | Injects the [MEMORY] block into every system prompt |
chat.message | Fetches memories on turn 1, semantic refresh on turn 2+ |
tool.memory | Registers the memory tool for explicit agent use |
event | Handles auto-save after assistant turns, compaction, session cleanup |
Data flow: retrieval
Turn 1 (session start)
On the first user message, four parallel fetches populate the memory cache:
- Profile — the user's cross-project preferences
- User semantic search — user-scoped memories matching the query
- Project list — all structured project memories (brief, architecture, tech context, etc.)
- Project semantic search — project-scoped memories semantically similar to the message
If zero project memories exist and the directory has code, auto-initialization kicks in silently. codexfi reads 28 common project files (README, package.json, Dockerfile, tsconfig, monorepo configs, agent instructions, and more) plus the recent 20 git commits, then extracts initial memories using the init extraction mode. Memories are re-fetched immediately so they are visible in the Turn 1 [MEMORY] block.
If zero project memories exist and the directory is empty, a [MEMORY - NEW PROJECT] hint is injected into the system prompt to guide the agent.
After the Turn 1 response is delivered, background enrichment fires as a separate pass. It generates a directory tree and extracts additional context from entry points and CI configs — enriching the memory store without delaying the first response.
Turn 2+ (semantic refresh)
Each subsequent user message triggers a single semantic search (~300ms) that refreshes the "Relevant to Current Task" section. This means topic switches cause different memories to surface automatically.
Every LLM call (system.transform)
The [MEMORY] block is rebuilt from the cache and injected into the system prompt. This happens on every LLM call, not just the first. The block contains structured sections (Project Brief, Architecture, etc.) plus semantically matched memories. Because injection happens through the system prompt — not a tool call or conversation turn — it consumes no conversation context tokens.
Data flow: storage
After every assistant turn
- The last 8 messages are snapshotted
- The configured extraction LLM analyzes the conversation and returns a JSON array of typed facts
- Each fact is embedded into a 1024-dimension vector using
voyage-code-3 - Deduplication — cosine similarity check against existing memories (threshold: 0.12 general, 0.25 structural types). Duplicates trigger an update instead of an insert.
- Aging rules —
progresstype: only the latest survives.session-summary: capped at 3; oldest condensed intolearned-pattern. - Contradiction detection — nearby memories are checked by the LLM for semantic contradictions. Stale memories are marked as superseded.
A 15-second cooldown prevents duplicate processing from OpenCode's double-fire of the completion event.
Session summaries
Every N turns (default: 5), a session-summary memory is generated in addition to the per-turn facts. Session summaries capture the high-level arc of what was worked on.
When the summary count exceeds 3, the oldest is condensed by the LLM into a compact learned-pattern memory, then deleted. This prevents unbounded growth while preserving distilled knowledge.
Compaction survival
When OpenCode truncates the conversation history (compaction), the [MEMORY] block is unaffected because it lives in the system prompt, not message history.
Additionally, codexfi:
- Intercepts the compaction event
- Injects current memories into the compaction context for richer summaries
- Sets a flag to trigger a full memory cache refresh on the next turn
This means memory is never lost to compaction.
Technology stack
| Component | Technology |
|---|---|
| Storage | SQLite with WAL mode (bun:sqlite) — zero npm dependencies, multi-agent safe, ACID transactions |
| Search | Exact nearest neighbor cosine similarity over Float32Array BLOBs |
| Embeddings | Voyage AI voyage-code-3 — 1024 dimensions |
| Extraction | Multi-provider LLM (Anthropic/xAI/Google) with automatic fallback |
| Runtime | Bun |
| Validation | Zod schemas for memory records |
Why SQLite?
codexfi originally used LanceDB for vector storage. LanceDB is excellent software, but its native NAPI bindings broke on OpenCode Desktop auto-updates — users got cryptic module loading errors with no workaround. We replaced it with bun:sqlite, which is built into the Bun runtime (like node:fs is built into Node) — zero npm packages, zero native binaries, zero breakage on updates.
SQLite with WAL (Write-Ahead Logging) also solves multi-agent concurrency. Many developers run multiple OpenCode agents simultaneously. WAL mode allows 100+ concurrent readers without blocking, with writers safely queued via busy_timeout. This was validated with stress tests spawning 20+ real OS processes hitting the same database file — zero data loss, zero corruption.
| LanceDB | SQLite WAL | JSONL (interim) | |
|---|---|---|---|
| Concurrent reads | Yes | 100+ readers, zero blocking | Stale data |
| Concurrent writes | Untested with multi-process | Single writer, others queue safely | Data loss |
| Native dependencies | NAPI binary (broke on updates) | None (built into Bun) | None |
| Storage size (3.5k records) | ~35MB | ~35MB | ~93MB |
| Search at 50k records | ~2ms (indexed) | ~20ms (exact NN) | ~20ms (exact NN) |
| Crash recovery | Lance format | Full ACID | Atomic rename only |
File structure
plugin/src/
├── index.ts — plugin hooks (system.transform, chat.message, tool.memory, event)
├── config.ts — centralized constants, pricing, thresholds
├── types.ts — Zod schemas for memory records
├── prompts.ts — all LLM prompt templates
├── db.ts — thin adapter (imports from store/)
├── store.ts — business logic: dedup, aging, contradiction, search with recency blending
├── store/ — SQLite persistence layer
│ ├── index.ts — public API (add, search, scan, update, delete, init, reload)
│ ├── sqlite.ts — connection management, WAL config, pragmas
│ ├── schema.ts — CREATE TABLE, indexes
│ ├── crud.ts — add, update, deleteById, getById, scan, countRows
│ ├── search.ts — vector search with cosine scoring + filters
│ ├── cosine.ts — Float32Array cosine distance (pure math)
│ └── types.ts — MemoryRecord, SearchResult, FilterOptions
├── extractor.ts — multi-provider extraction with fallback and retry
├── embedder.ts — Voyage AI embedding via fetch
├── retry.ts — exponential backoff with jitter
├── telemetry.ts — cost tracking (CostLedger + ActivityLog)
├── plugin-config.ts — user config from ~/.codexfi/codexfi.jsonc
└── services/
├── auto-init-config.ts — init file list (28 files) and total char cap
├── auto-save.ts — background extraction after assistant turns
├── compaction.ts — context window compaction handling
├── context.ts — [MEMORY] block formatting
├── directory-tree.ts — project tree generator for background enrichment
├── disabled-warning.ts — warning hint when plugin is misconfigured
├── fresh-project-hint.ts — [MEMORY - NEW PROJECT] hint for empty directories
├── privacy.ts — <private> tag stripping
└── tags.ts — project/user tag computation