codexfi
Quality & Testing

Test Suite

codexfi ships with a three-tier test suite that verifies the memory system works end-to-end — from vector storage through to what the agent actually says in a conversation.

Test Tiers ArchitectureUNIT< 1sisolated logicmocked dependenciesINTEGRATION~30sreal store.dbreal embedderE2E5–10 minreal opencode13 scenariosfast · isolatedslow · real

What is verified

The 15 E2E scenarios test the behaviors you rely on as a user:

ScenarioWhat it guarantees for you
Cross-Session Memory ContinuityFacts you mention in one session are available in the next
README-Based Project SeedingOpening a project with a README auto-populates memory without any action from you
Transcript Noise GuardMemory contains distilled facts, not raw conversation transcripts
Project-Brief Always PresentEven without a README, the agent builds project context from conversation
Memory AgingOld "current status" entries are replaced — you always get the latest state, not history
Existing Codebase Auto-InitOpening an existing codebase populates memory from project files automatically
Enumeration Hybrid Retrieval"List all my preferences" queries return facts across all sessions, not just the recent ones
Cross-SynthesisQueries about multiple projects surface facts from both namespaces
Memory Under LoadEarly session facts are still recalled when many memories are stored
Knowledge UpdateAfter you change tools or approaches, the agent uses the new information, not the old
System Prompt InjectionThe [MEMORY] block is always present in the system prompt — never missed
Multi-Turn Per-Turn RefreshAs you switch topics mid-session, relevant memories surface automatically
Auto-Init Turn 1 VisibilityAuto-init memories are visible on Turn 1 and background enrichment fires after the first response
Active-Context Singleton AgingOnly the most recent active-context memory survives; older ones are replaced, not stacked
Recent Sessions CoverageThe Recent Sessions section reflects at least 3 distinct sessions of real work

Latest results

E2E Scenario Map01Cross-SessionPASS02README SeedingPASS03Noise GuardPASS04Brief AlwaysPASS05Memory AgingPASS06Codebase InitPASS07Hybrid RetrievalPASS08Cross-SynthesisPASS09Under LoadWARN10Knowledge UpdatePASS11Prompt InjectionPASS12Multi-TurnPASS13Auto-Init Turn 1PASS12 / 13 PASSscenario 09: non-deterministic

Results snapshot from feat/pure-ts-vector-store branch (2026-04-06) — stale. Counts updated to reflect current suite (PR #174).

Unit:        171 pass, 0 fail   (~1s)
Integration:  34 pass, 0 fail   (~3s)
Stress:        3 pass, 0 fail   (~1s)
E2E:       15/15 pass           (~10min)

For contributors

The sections below cover the test architecture and how to run each tier locally.

Test tiers

TierTestsWhat runsSpeed
Unit171Cosine math, CRUD, filters, serialisation, dedup, aging, extraction parsing, IngestResult schema~1s
Integration34Real SQLite store + real Voyage AI embedder against a test database~3s
Stress320 real OS processes hitting the same SQLite file — concurrent writes, reads, and mixed~1s
E2E15Real opencode agent sessions, real memory store, 15 autonomous scenarios~10 min

The stress tests validate multi-agent safety — the scenario where multiple OpenCode agents read and write the same store.db simultaneously. Each test spawns real child processes (not async within one process) to match the actual deployment pattern.

Running unit and integration tests

# From the repo root
bun run test                    # unit + integration together
bun test src/unit/              # unit only
bun test src/integration/       # integration only

Running E2E tests

# From the repo root
bun run test:e2e                     # all 15 scenarios
bun run test:e2e:scenario 07         # single scenario
bun run test:e2e:scenario 07,08,12   # subset

Or directly from the testing/ directory:

cd testing
bun install       # first time only
bun run test      # unit + integration tests
bun run test:e2e  # all 15 scenarios
bun run test:scenario 07

E2E prerequisites

The E2E suite spawns real opencode run and opencode serve processes. Four things must be in place:

  1. Plugin builtcd plugin && bun run build
  2. OpenCode CLI installedbun install -g opencode-ai (v1.2.10+)
  3. Plugin registered in ~/.config/opencode/opencode.json:
    {
      "plugin": ["file:///absolute/path/to/codexfi/plugin/dist/index.js"]
    }
  4. API keys configured in ~/.codexfi/codexfi.jsonc:
    {
      "voyageApiKey": "pa-...",
      "anthropicApiKey": "sk-ant-..."
    }
    The plugin reads keys exclusively from codexfi.jsonc — setting VOYAGE_API_KEY or ANTHROPIC_API_KEY as environment variables is not sufficient. Run bunx codexfi install to create this file interactively, or create it manually. An AI provider key for the OpenCode agent sessions is also required — set this via ANTHROPIC_API_KEY (or equivalent) in your shell environment for OpenCode's own model calls.
Test Isolation Mechanicseach scenario — fully isolatedtmp dir/oc-test-...-uuidopencode runreal processassertmemory storecleanupmemories deleted

Full E2E scenario reference

Each scenario runs in an isolated temporary directory. All memories it creates are deleted from the store automatically after it completes.

#NameWhat it tests internally
01Cross-Session Memory ContinuityAuto-save fires after session end; session 2 recalls facts from session 1
02README-Based Project-Brief SeedingtriggerSilentAutoInit reads README on first session; project-brief memory is created and recalled in session 2
03Transcript Noise GuardSaved memories contain no raw [user]/[assistant] transcript lines
04Project-Brief Always PresentMemories accumulate from conversation even without README; session 2 recalls project facts
05Memory AgingBackend replaces older progress memories with newest; only 1 survives across 3 sessions
06Existing Codebase Auto-InittriggerSilentAutoInit reads real project files (package.json, tsconfig, src/) on first open
07Enumeration Hybrid Retrievaltypes[] param fires for "list all preferences" queries; answer covers preferences across all sessions
08Cross-Synthesis (isWideSynthesis)"across both projects" heuristic fires; answer synthesises facts from two project namespaces
09maxMemories=20 Under LoadWith >10 memories stored, facts from early sessions still recalled — confirms K=20 retrieval depth
10Knowledge Update / SupersededAfter ORM migration, agent answers with Tortoise (new), not SQLAlchemy (stale)
11System Prompt Memory Injection[MEMORY] block injected via system.transform into the system prompt — not as a synthetic message part
12Multi-Turn Per-Turn Refresh6-turn conversation via opencode serve; per-turn semantic refresh surfaces topic-relevant memories on topic switches
13Auto-Init Turn 1 Visibility + EnrichmentAuto-init uses init mode with 28 project files + git log; re-fetches memories for Turn 1 visibility; background enrichment fires after first response
14Active-Context Singleton AgingageActiveContext() deletes prior active-context on new insertion; only 1 active-context survives across multiple extractions
15Recent Sessions CoverageThree distinct session summaries accumulate; ## Recent Sessions section in [MEMORY] block reflects all three

Known issue: OpenCode Desktop app interference

If the OpenCode Desktop app is running, it sets OPENCODE_SERVER_PASSWORD, OPENCODE_SERVER_USERNAME, and OPENCODE_CLIENT in your shell environment. The opencode run CLI inherits these and its internal server then requires Basic Auth — causing every CLI session to fail silently.

The test harness handles this automatically via cleanEnv() in testing/src/opencode.ts. You do not need to close the Desktop app to run the E2E suite.

If you run opencode run manually from a terminal where the Desktop app is active:

env -u OPENCODE_SERVER_PASSWORD -u OPENCODE_SERVER_USERNAME -u OPENCODE_CLIENT \
  opencode run "your message" --dir /path/to/project -m anthropic/claude-sonnet-4-6

On this page