Architecture Overview

codexfi is an OpenCode plugin that runs entirely in-process. There are no servers to run, no Docker containers, no infrastructure to manage — just two API keys (Voyage AI for embedding, one LLM provider for extraction).

System diagram

Plugin hooks

codexfi registers five hooks with the OpenCode plugin system:

Hook	Purpose
`experimental.chat.messages.transform`	Caches recent messages for extraction
`experimental.chat.system.transform`	Injects the `[MEMORY]` block into every system prompt
`chat.message`	Fetches memories on turn 1, semantic refresh on turn 2+
`tool.memory`	Registers the `memory` tool for explicit agent use
`event`	Handles auto-save after assistant turns, compaction, session cleanup

Data flow: retrieval

Turn 1 (session start)

On the first user message, four parallel fetches populate the memory cache:

Profile — the user's cross-project preferences
User semantic search — user-scoped memories matching the query
Project list — all structured project memories (brief, architecture, tech context, etc.)
Project semantic search — project-scoped memories semantically similar to the message

If zero project memories exist and the directory has code, auto-initialization kicks in silently. codexfi reads 28 common project files (README, package.json, Dockerfile, tsconfig, monorepo configs, agent instructions, and more) plus the recent 20 git commits, then extracts initial memories using the init extraction mode. Memories are re-fetched immediately so they are visible in the Turn 1 [MEMORY] block.

If zero project memories exist and the directory is empty, a [MEMORY - NEW PROJECT] hint is injected into the system prompt to guide the agent.

After the Turn 1 response is delivered, background enrichment fires as a separate pass. It generates a directory tree and extracts additional context from entry points and CI configs — enriching the memory store without delaying the first response.

Turn 2+ (semantic refresh)

Each subsequent user message triggers a single semantic search (~300ms) that refreshes the "Relevant to Current Task" section. This means topic switches cause different memories to surface automatically.

Every LLM call (system.transform)

The [MEMORY] block is rebuilt from the cache and injected into the system prompt. This happens on every LLM call, not just the first. The block contains structured sections (Project Brief, Architecture, Active Context, Recent Sessions, etc.) plus semantically matched memories. Because injection happens through the system prompt — not a tool call or conversation turn — it consumes no conversation context tokens.

Data flow: storage

After every assistant turn

The last 8 messages are snapshotted
The configured extraction LLM analyzes the conversation and returns a JSON array of typed facts
Each fact is embedded into a 1024-dimension vector using voyage-code-3
Deduplication — cosine similarity check against existing memories (threshold: 0.12 general, 0.25 structural types). Duplicates trigger an update instead of an insert.
Aging rules — progress type: only the latest survives. active-context type: only the latest survives. session-summary: capped at 3; oldest condensed into learned-pattern.
Contradiction detection — nearby memories are checked by the LLM for semantic contradictions. Stale memories are marked as superseded.

A 15-second cooldown prevents duplicate processing from OpenCode's double-fire of the completion event.

Session summaries

Every N turns (default: 5), a session-summary memory is generated in addition to the per-turn facts. Session summaries capture the high-level arc of what was worked on.

When the summary count exceeds 3, the oldest is condensed by the LLM into a compact learned-pattern memory, then deleted. This prevents unbounded growth while preserving distilled knowledge.

Compaction survival

When OpenCode truncates the conversation history (compaction), the [MEMORY] block is unaffected because it lives in the system prompt, not message history.

Additionally, codexfi:

Intercepts the compaction event
Injects current memories into the compaction context for richer summaries
Sets a flag to trigger a full memory cache refresh on the next turn

This means memory is never lost to compaction.

Technology stack

Component	Technology
Storage	SQLite with WAL mode — built in to the runtime (`bun:sqlite` on Bun, `node:sqlite` on Node), zero npm dependencies, multi-agent safe, ACID transactions
Search	Exact nearest neighbor cosine similarity over Float32Array BLOBs
Embeddings	Voyage AI `voyage-code-3` — 1024 dimensions
Extraction	Multi-provider LLM (Anthropic/xAI/Google) with automatic fallback
Runtime	Bun (OpenCode CLI/TUI) · Node/Electron (OpenCode Desktop) — the plugin is runtime-adaptive
Validation	Zod schemas for memory records

Why SQLite?

codexfi originally used LanceDB for vector storage. LanceDB is excellent software, but its native NAPI bindings broke on OpenCode Desktop auto-updates — users got cryptic module loading errors with no workaround. We replaced it with SQLite, which ships built in to the JavaScript runtime (like node:fs) — zero npm packages, zero native binaries to rebuild per platform.

OpenCode runs the plugin under two runtimes: Bun for the CLI/TUI, and Node (an Electron utilityProcess) for the Desktop app. The store is therefore runtime-adaptive — it loads bun:sqlite under Bun and node:sqlite under Node. Both are built-in drivers that share the identical on-disk file format, so the same ~/.codexfi database is read and written by either runtime with no migration. This keeps the "zero native dependencies" guarantee on every OpenCode surface.

SQLite with WAL (Write-Ahead Logging) also solves multi-agent concurrency. Many developers run multiple OpenCode agents simultaneously. WAL mode allows 100+ concurrent readers without blocking, with writers safely queued via busy_timeout. This was validated with stress tests spawning 20+ real OS processes hitting the same database file — zero data loss, zero corruption.

	LanceDB	SQLite WAL	JSONL (interim)
Concurrent reads	Yes	100+ readers, zero blocking	Stale data
Concurrent writes	Untested with multi-process	Single writer, others queue safely	Data loss
Native dependencies	NAPI binary (broke on updates)	None (built into Bun & Node)	None
Storage size (3.5k records)	~35MB	~35MB	~93MB
Search at 50k records	~2ms (indexed)	~20ms (exact NN)	~20ms (exact NN)
Crash recovery	Lance format	Full ACID	Atomic rename only

File structure

plugin/src/
├── index.ts          — plugin hooks (system.transform, chat.message, tool.memory, event)
├── config.ts         — centralized constants, pricing, thresholds
├── types.ts          — Zod schemas for memory records
├── prompts.ts        — all LLM prompt templates
├── db.ts             — thin adapter (imports from store/)
├── store.ts          — business logic: dedup, aging, contradiction, search with recency blending
├── store/            — SQLite persistence layer
│   ├── index.ts      — public API (add, search, scan, update, delete, init, reload)
│   ├── sqlite.ts     — connection management, WAL config, pragmas
│   ├── schema.ts     — CREATE TABLE, indexes
│   ├── crud.ts       — add, update, deleteById, getById, scan, countRows
│   ├── search.ts     — vector search with cosine scoring + filters
│   ├── cosine.ts     — Float32Array cosine distance (pure math)
│   └── types.ts      — MemoryRecord, SearchResult, FilterOptions
├── extractor.ts      — multi-provider extraction with fallback and retry
├── embedder.ts       — Voyage AI embedding via fetch
├── retry.ts          — exponential backoff with jitter
├── telemetry.ts      — cost tracking (CostLedger + ActivityLog)
├── plugin-config.ts  — user config from ~/.codexfi/codexfi.jsonc
└── services/
    ├── auto-init-config.ts — init file list (28 files) and total char cap
    ├── auto-save.ts  — background extraction after assistant turns
    ├── compaction.ts — context window compaction handling
    ├── context.ts    — [MEMORY] block formatting
    ├── directory-tree.ts — project tree generator for background enrichment
    ├── disabled-warning.ts — warning hint when plugin is misconfigured
    ├── fresh-project-hint.ts — [MEMORY - NEW PROJECT] hint for empty directories
    ├── privacy.ts    — <private> tag stripping
    └── tags.ts       — project/user tag computation

On this page