codexfi
Quality & Testing

Test Suite

codexfi ships with a three-tier test suite that verifies the memory system works end-to-end — from vector storage through to what the agent actually says in a conversation.

Test Tiers ArchitectureUNIT< 1sisolated logicmocked dependenciesINTEGRATION~30sreal store.dbreal embedderE2E5–10 minreal opencode13 scenariosfast · isolatedslow · real

What is verified

The 13 E2E scenarios test the behaviors you rely on as a user:

ScenarioWhat it guarantees for you
Cross-Session Memory ContinuityFacts you mention in one session are available in the next
README-Based Project SeedingOpening a project with a README auto-populates memory without any action from you
Transcript Noise GuardMemory contains distilled facts, not raw conversation transcripts
Project-Brief Always PresentEven without a README, the agent builds project context from conversation
Memory AgingOld "current status" entries are replaced — you always get the latest state, not history
Existing Codebase Auto-InitOpening an existing codebase populates memory from project files automatically
Enumeration Hybrid Retrieval"List all my preferences" queries return facts across all sessions, not just the recent ones
Cross-SynthesisQueries about multiple projects surface facts from both namespaces
Memory Under LoadEarly session facts are still recalled when many memories are stored
Knowledge UpdateAfter you change tools or approaches, the agent uses the new information, not the old
System Prompt InjectionThe [MEMORY] block is always present in the system prompt — never missed
Multi-Turn Per-Turn RefreshAs you switch topics mid-session, relevant memories surface automatically
Auto-Init Turn 1 VisibilityAuto-init memories are visible on Turn 1 and background enrichment fires after the first response

Latest results

E2E Scenario Map01Cross-SessionPASS02README SeedingPASS03Noise GuardPASS04Brief AlwaysPASS05Memory AgingPASS06Codebase InitPASS07Hybrid RetrievalPASS08Cross-SynthesisPASS09Under LoadWARN10Knowledge UpdatePASS11Prompt InjectionPASS12Multi-TurnPASS13Auto-Init Turn 1PASS12 / 13 PASSscenario 09: non-deterministic

Run against feat/pure-ts-vector-store branch (SQLite WAL store) — 2026-04-06.

Unit:        139 pass, 0 fail   (~1s)
Integration:  31 pass, 0 fail   (~3s)
Stress:        3 pass, 0 fail   (~1s)
E2E:       12/13 pass           (~10min)
FAIL  01  Cross-Session Memory Continuity       17.5s
PASS  02  README-Based Project-Brief Seeding    14.9s
PASS  03  Transcript Noise Guard                15.0s
PASS  04  Project Brief Always Present          19.5s
PASS  05  Memory Aging                          50.3s
PASS  06  Existing Codebase Auto-Init           68.1s
PASS  07  Enumeration Hybrid Retrieval          38.2s
PASS  08  Cross-Synthesis (isWideSynthesis)     54.9s
PASS  09  maxMemories=20 Under Load            176.9s
PASS  10  Knowledge Update / Superseded         67.3s
PASS  11  System Prompt Memory Injection        23.2s
PASS  12  Multi-Turn Per-Turn Refresh           59.2s
PASS  13  Auto-Init Turn 1 Visibility           24.0s

Scenario 01 failure is extraction variance — the agent recalled all tech facts correctly but said "a CLI task management tool" instead of the project name "taskflow". Previous solo run passed. Not a store bug.


For contributors

The sections below cover the test architecture and how to run each tier locally.

Test tiers

TierTestsWhat runsSpeed
Unit139Cosine math, CRUD, filters, serialisation, dedup, aging, extraction parsing~1s
Integration31Real SQLite store + real Voyage AI embedder against a test database~3s
Stress320 real OS processes hitting the same SQLite file — concurrent writes, reads, and mixed~1s
E2E13Real opencode agent sessions, real memory store, 13 autonomous scenarios~10 min

The stress tests validate multi-agent safety — the scenario where multiple OpenCode agents read and write the same store.db simultaneously. Each test spawns real child processes (not async within one process) to match the actual deployment pattern.

Running unit and integration tests

# From the repo root
bun run test                    # unit + integration together
bun test src/unit/              # unit only
bun test src/integration/       # integration only

Running E2E tests

# From the repo root
bun run test:e2e                     # all 13 scenarios
bun run test:e2e:scenario 07         # single scenario
bun run test:e2e:scenario 07,08,12   # subset

Or directly from the testing/ directory:

cd testing
bun install       # first time only
bun run test      # unit + integration tests
bun run test:e2e  # all 13 scenarios
bun run test:scenario 07

E2E prerequisites

The E2E suite spawns real opencode run and opencode serve processes. Four things must be in place:

  1. Plugin builtcd plugin && bun run build
  2. OpenCode CLI installedbun install -g opencode-ai (v1.2.10+)
  3. Plugin registered in ~/.config/opencode/opencode.json:
    {
      "plugin": ["file:///absolute/path/to/codexfi/plugin/dist/index.js"]
    }
  4. API keys configured in ~/.codexfi/codexfi.jsonc:
    {
      "voyageApiKey": "pa-...",
      "anthropicApiKey": "sk-ant-..."
    }
    The plugin reads keys exclusively from codexfi.jsonc — setting VOYAGE_API_KEY or ANTHROPIC_API_KEY as environment variables is not sufficient. Run bunx codexfi install to create this file interactively, or create it manually. An AI provider key for the OpenCode agent sessions is also required — set this via ANTHROPIC_API_KEY (or equivalent) in your shell environment for OpenCode's own model calls.
Test Isolation Mechanicseach scenario — fully isolatedtmp dir/oc-test-...-uuidopencode runreal processassertmemory storecleanupmemories deleted

Full E2E scenario reference

Each scenario runs in an isolated temporary directory. All memories it creates are deleted from the store automatically after it completes.

#NameWhat it tests internally
01Cross-Session Memory ContinuityAuto-save fires after session end; session 2 recalls facts from session 1
02README-Based Project-Brief SeedingtriggerSilentAutoInit reads README on first session; project-brief memory is created and recalled in session 2
03Transcript Noise GuardSaved memories contain no raw [user]/[assistant] transcript lines
04Project-Brief Always PresentMemories accumulate from conversation even without README; session 2 recalls project facts
05Memory AgingBackend replaces older progress memories with newest; only 1 survives across 3 sessions
06Existing Codebase Auto-InittriggerSilentAutoInit reads real project files (package.json, tsconfig, src/) on first open
07Enumeration Hybrid Retrievaltypes[] param fires for "list all preferences" queries; answer covers preferences across all sessions
08Cross-Synthesis (isWideSynthesis)"across both projects" heuristic fires; answer synthesises facts from two project namespaces
09maxMemories=20 Under LoadWith >10 memories stored, facts from early sessions still recalled — confirms K=20 retrieval depth
10Knowledge Update / SupersededAfter ORM migration, agent answers with Tortoise (new), not SQLAlchemy (stale)
11System Prompt Memory Injection[MEMORY] block injected via system.transform into the system prompt — not as a synthetic message part
12Multi-Turn Per-Turn Refresh6-turn conversation via opencode serve; per-turn semantic refresh surfaces topic-relevant memories on topic switches
13Auto-Init Turn 1 Visibility + EnrichmentAuto-init uses init mode with 28 project files + git log; re-fetches memories for Turn 1 visibility; background enrichment fires after first response

Known issue: OpenCode Desktop app interference

If the OpenCode Desktop app is running, it sets OPENCODE_SERVER_PASSWORD, OPENCODE_SERVER_USERNAME, and OPENCODE_CLIENT in your shell environment. The opencode run CLI inherits these and its internal server then requires Basic Auth — causing every CLI session to fail silently.

The test harness handles this automatically via cleanEnv() in testing/src/opencode.ts. You do not need to close the Desktop app to run the E2E suite.

If you run opencode run manually from a terminal where the Desktop app is active:

env -u OPENCODE_SERVER_PASSWORD -u OPENCODE_SERVER_USERNAME -u OPENCODE_CLIENT \
  opencode run "your message" --dir /path/to/project -m anthropic/claude-sonnet-4-6

On this page