codexfi
How it Works

Architecture Overview

codexfi is an OpenCode plugin that runs entirely in-process. There are no servers to run, no Docker containers, no infrastructure to manage — just two API keys (Voyage AI for embedding, one LLM provider for extraction).

System diagram

Session Flow Diagramafter assistant turnTurn 1: full fetch / Turn 2+: refreshextract · embed · dedup · ageUser Messagechat.messagesystem.transform[MEMORY]LLM ResponseAssistant turnevent hookauto-savestore.db

Plugin hooks

codexfi registers five hooks with the OpenCode plugin system:

HookPurpose
experimental.chat.messages.transformCaches recent messages for extraction
experimental.chat.system.transformInjects the [MEMORY] block into every system prompt
chat.messageFetches memories on turn 1, semantic refresh on turn 2+
tool.memoryRegisters the memory tool for explicit agent use
eventHandles auto-save after assistant turns, compaction, session cleanup

Data flow: retrieval

Turn 1 (session start)

On the first user message, four parallel fetches populate the memory cache:

Parallel Fetch Diagramall 4 fire simultaneouslyUser MessageUser ProfileUser SemanticProject ListProject SemanticMemory Cache
  1. Profile — the user's cross-project preferences
  2. User semantic search — user-scoped memories matching the query
  3. Project list — all structured project memories (brief, architecture, tech context, etc.)
  4. Project semantic search — project-scoped memories semantically similar to the message

If zero project memories exist and the directory has code, auto-initialization kicks in silently. codexfi reads 28 common project files (README, package.json, Dockerfile, tsconfig, monorepo configs, agent instructions, and more) plus the recent 20 git commits, then extracts initial memories using the init extraction mode. Memories are re-fetched immediately so they are visible in the Turn 1 [MEMORY] block.

If zero project memories exist and the directory is empty, a [MEMORY - NEW PROJECT] hint is injected into the system prompt to guide the agent.

After the Turn 1 response is delivered, background enrichment fires as a separate pass. It generates a directory tree and extracts additional context from entry points and CI configs — enriching the memory store without delaying the first response.

Turn 2+ (semantic refresh)

Each subsequent user message triggers a single semantic search (~300ms) that refreshes the "Relevant to Current Task" section. This means topic switches cause different memories to surface automatically.

Every LLM call (system.transform)

The [MEMORY] block is rebuilt from the cache and injected into the system prompt. This happens on every LLM call, not just the first. The block contains structured sections (Project Brief, Architecture, etc.) plus semantically matched memories. Because injection happens through the system prompt — not a tool call or conversation turn — it consumes no conversation context tokens.

Data flow: storage

After every assistant turn

  1. The last 8 messages are snapshotted
  2. The configured extraction LLM analyzes the conversation and returns a JSON array of typed facts
  3. Each fact is embedded into a 1024-dimension vector using voyage-code-3
  4. Deduplication — cosine similarity check against existing memories (threshold: 0.12 general, 0.25 structural types). Duplicates trigger an update instead of an insert.
  5. Aging rulesprogress type: only the latest survives. session-summary: capped at 3; oldest condensed into learned-pattern.
  6. Contradiction detection — nearby memories are checked by the LLM for semantic contradictions. Stale memories are marked as superseded.

A 15-second cooldown prevents duplicate processing from OpenCode's double-fire of the completion event.

Session summaries

Every N turns (default: 5), a session-summary memory is generated in addition to the per-turn facts. Session summaries capture the high-level arc of what was worked on.

When the summary count exceeds 3, the oldest is condensed by the LLM into a compact learned-pattern memory, then deleted. This prevents unbounded growth while preserving distilled knowledge.

Compaction survival

Compaction Survival DiagramBEFOREAFTER[MEMORY]COMPACTIONcontext window truncated[MEMORY] ✓

When OpenCode truncates the conversation history (compaction), the [MEMORY] block is unaffected because it lives in the system prompt, not message history.

Additionally, codexfi:

  1. Intercepts the compaction event
  2. Injects current memories into the compaction context for richer summaries
  3. Sets a flag to trigger a full memory cache refresh on the next turn

This means memory is never lost to compaction.

Technology stack

ComponentTechnology
StorageSQLite with WAL mode (bun:sqlite) — zero npm dependencies, multi-agent safe, ACID transactions
SearchExact nearest neighbor cosine similarity over Float32Array BLOBs
EmbeddingsVoyage AI voyage-code-3 — 1024 dimensions
ExtractionMulti-provider LLM (Anthropic/xAI/Google) with automatic fallback
RuntimeBun
ValidationZod schemas for memory records

Why SQLite?

codexfi originally used LanceDB for vector storage. LanceDB is excellent software, but its native NAPI bindings broke on OpenCode Desktop auto-updates — users got cryptic module loading errors with no workaround. We replaced it with bun:sqlite, which is built into the Bun runtime (like node:fs is built into Node) — zero npm packages, zero native binaries, zero breakage on updates.

SQLite with WAL (Write-Ahead Logging) also solves multi-agent concurrency. Many developers run multiple OpenCode agents simultaneously. WAL mode allows 100+ concurrent readers without blocking, with writers safely queued via busy_timeout. This was validated with stress tests spawning 20+ real OS processes hitting the same database file — zero data loss, zero corruption.

LanceDBSQLite WALJSONL (interim)
Concurrent readsYes100+ readers, zero blockingStale data
Concurrent writesUntested with multi-processSingle writer, others queue safelyData loss
Native dependenciesNAPI binary (broke on updates)None (built into Bun)None
Storage size (3.5k records)~35MB~35MB~93MB
Search at 50k records~2ms (indexed)~20ms (exact NN)~20ms (exact NN)
Crash recoveryLance formatFull ACIDAtomic rename only

File structure

plugin/src/
├── index.ts          — plugin hooks (system.transform, chat.message, tool.memory, event)
├── config.ts         — centralized constants, pricing, thresholds
├── types.ts          — Zod schemas for memory records
├── prompts.ts        — all LLM prompt templates
├── db.ts             — thin adapter (imports from store/)
├── store.ts          — business logic: dedup, aging, contradiction, search with recency blending
├── store/            — SQLite persistence layer
│   ├── index.ts      — public API (add, search, scan, update, delete, init, reload)
│   ├── sqlite.ts     — connection management, WAL config, pragmas
│   ├── schema.ts     — CREATE TABLE, indexes
│   ├── crud.ts       — add, update, deleteById, getById, scan, countRows
│   ├── search.ts     — vector search with cosine scoring + filters
│   ├── cosine.ts     — Float32Array cosine distance (pure math)
│   └── types.ts      — MemoryRecord, SearchResult, FilterOptions
├── extractor.ts      — multi-provider extraction with fallback and retry
├── embedder.ts       — Voyage AI embedding via fetch
├── retry.ts          — exponential backoff with jitter
├── telemetry.ts      — cost tracking (CostLedger + ActivityLog)
├── plugin-config.ts  — user config from ~/.codexfi/codexfi.jsonc
└── services/
    ├── auto-init-config.ts — init file list (28 files) and total char cap
    ├── auto-save.ts  — background extraction after assistant turns
    ├── compaction.ts — context window compaction handling
    ├── context.ts    — [MEMORY] block formatting
    ├── directory-tree.ts — project tree generator for background enrichment
    ├── disabled-warning.ts — warning hint when plugin is misconfigured
    ├── fresh-project-hint.ts — [MEMORY - NEW PROJECT] hint for empty directories
    ├── privacy.ts    — <private> tag stripping
    └── tags.ts       — project/user tag computation

On this page