Test Suite
codexfi ships with a three-tier test suite that verifies the memory system works end-to-end — from vector storage through to what the agent actually says in a conversation.
What is verified
The 13 E2E scenarios test the behaviors you rely on as a user:
| Scenario | What it guarantees for you |
|---|---|
| Cross-Session Memory Continuity | Facts you mention in one session are available in the next |
| README-Based Project Seeding | Opening a project with a README auto-populates memory without any action from you |
| Transcript Noise Guard | Memory contains distilled facts, not raw conversation transcripts |
| Project-Brief Always Present | Even without a README, the agent builds project context from conversation |
| Memory Aging | Old "current status" entries are replaced — you always get the latest state, not history |
| Existing Codebase Auto-Init | Opening an existing codebase populates memory from project files automatically |
| Enumeration Hybrid Retrieval | "List all my preferences" queries return facts across all sessions, not just the recent ones |
| Cross-Synthesis | Queries about multiple projects surface facts from both namespaces |
| Memory Under Load | Early session facts are still recalled when many memories are stored |
| Knowledge Update | After you change tools or approaches, the agent uses the new information, not the old |
| System Prompt Injection | The [MEMORY] block is always present in the system prompt — never missed |
| Multi-Turn Per-Turn Refresh | As you switch topics mid-session, relevant memories surface automatically |
| Auto-Init Turn 1 Visibility | Auto-init memories are visible on Turn 1 and background enrichment fires after the first response |
Latest results
Run against feat/pure-ts-vector-store branch (SQLite WAL store) — 2026-04-06.
Unit: 139 pass, 0 fail (~1s)
Integration: 31 pass, 0 fail (~3s)
Stress: 3 pass, 0 fail (~1s)
E2E: 12/13 pass (~10min)FAIL 01 Cross-Session Memory Continuity 17.5s
PASS 02 README-Based Project-Brief Seeding 14.9s
PASS 03 Transcript Noise Guard 15.0s
PASS 04 Project Brief Always Present 19.5s
PASS 05 Memory Aging 50.3s
PASS 06 Existing Codebase Auto-Init 68.1s
PASS 07 Enumeration Hybrid Retrieval 38.2s
PASS 08 Cross-Synthesis (isWideSynthesis) 54.9s
PASS 09 maxMemories=20 Under Load 176.9s
PASS 10 Knowledge Update / Superseded 67.3s
PASS 11 System Prompt Memory Injection 23.2s
PASS 12 Multi-Turn Per-Turn Refresh 59.2s
PASS 13 Auto-Init Turn 1 Visibility 24.0sScenario 01 failure is extraction variance — the agent recalled all tech facts correctly but said "a CLI task management tool" instead of the project name "taskflow". Previous solo run passed. Not a store bug.
For contributors
The sections below cover the test architecture and how to run each tier locally.
Test tiers
| Tier | Tests | What runs | Speed |
|---|---|---|---|
| Unit | 139 | Cosine math, CRUD, filters, serialisation, dedup, aging, extraction parsing | ~1s |
| Integration | 31 | Real SQLite store + real Voyage AI embedder against a test database | ~3s |
| Stress | 3 | 20 real OS processes hitting the same SQLite file — concurrent writes, reads, and mixed | ~1s |
| E2E | 13 | Real opencode agent sessions, real memory store, 13 autonomous scenarios | ~10 min |
The stress tests validate multi-agent safety — the scenario where multiple OpenCode agents read and write the same store.db simultaneously. Each test spawns real child processes (not async within one process) to match the actual deployment pattern.
Running unit and integration tests
# From the repo root
bun run test # unit + integration together
bun test src/unit/ # unit only
bun test src/integration/ # integration onlyRunning E2E tests
# From the repo root
bun run test:e2e # all 13 scenarios
bun run test:e2e:scenario 07 # single scenario
bun run test:e2e:scenario 07,08,12 # subsetOr directly from the testing/ directory:
cd testing
bun install # first time only
bun run test # unit + integration tests
bun run test:e2e # all 13 scenarios
bun run test:scenario 07E2E prerequisites
The E2E suite spawns real opencode run and opencode serve processes. Four things must be in place:
- Plugin built —
cd plugin && bun run build - OpenCode CLI installed —
bun install -g opencode-ai(v1.2.10+) - Plugin registered in
~/.config/opencode/opencode.json:{ "plugin": ["file:///absolute/path/to/codexfi/plugin/dist/index.js"] } - API keys configured in
~/.codexfi/codexfi.jsonc: The plugin reads keys exclusively from{ "voyageApiKey": "pa-...", "anthropicApiKey": "sk-ant-..." }codexfi.jsonc— settingVOYAGE_API_KEYorANTHROPIC_API_KEYas environment variables is not sufficient. Runbunx codexfi installto create this file interactively, or create it manually. An AI provider key for the OpenCode agent sessions is also required — set this viaANTHROPIC_API_KEY(or equivalent) in your shell environment for OpenCode's own model calls.
Full E2E scenario reference
Each scenario runs in an isolated temporary directory. All memories it creates are deleted from the store automatically after it completes.
| # | Name | What it tests internally |
|---|---|---|
| 01 | Cross-Session Memory Continuity | Auto-save fires after session end; session 2 recalls facts from session 1 |
| 02 | README-Based Project-Brief Seeding | triggerSilentAutoInit reads README on first session; project-brief memory is created and recalled in session 2 |
| 03 | Transcript Noise Guard | Saved memories contain no raw [user]/[assistant] transcript lines |
| 04 | Project-Brief Always Present | Memories accumulate from conversation even without README; session 2 recalls project facts |
| 05 | Memory Aging | Backend replaces older progress memories with newest; only 1 survives across 3 sessions |
| 06 | Existing Codebase Auto-Init | triggerSilentAutoInit reads real project files (package.json, tsconfig, src/) on first open |
| 07 | Enumeration Hybrid Retrieval | types[] param fires for "list all preferences" queries; answer covers preferences across all sessions |
| 08 | Cross-Synthesis (isWideSynthesis) | "across both projects" heuristic fires; answer synthesises facts from two project namespaces |
| 09 | maxMemories=20 Under Load | With >10 memories stored, facts from early sessions still recalled — confirms K=20 retrieval depth |
| 10 | Knowledge Update / Superseded | After ORM migration, agent answers with Tortoise (new), not SQLAlchemy (stale) |
| 11 | System Prompt Memory Injection | [MEMORY] block injected via system.transform into the system prompt — not as a synthetic message part |
| 12 | Multi-Turn Per-Turn Refresh | 6-turn conversation via opencode serve; per-turn semantic refresh surfaces topic-relevant memories on topic switches |
| 13 | Auto-Init Turn 1 Visibility + Enrichment | Auto-init uses init mode with 28 project files + git log; re-fetches memories for Turn 1 visibility; background enrichment fires after first response |
Known issue: OpenCode Desktop app interference
If the OpenCode Desktop app is running, it sets OPENCODE_SERVER_PASSWORD, OPENCODE_SERVER_USERNAME, and OPENCODE_CLIENT in your shell environment. The opencode run CLI inherits these and its internal server then requires Basic Auth — causing every CLI session to fail silently.
The test harness handles this automatically via cleanEnv() in testing/src/opencode.ts. You do not need to close the Desktop app to run the E2E suite.
If you run opencode run manually from a terminal where the Desktop app is active:
env -u OPENCODE_SERVER_PASSWORD -u OPENCODE_SERVER_USERNAME -u OPENCODE_CLIENT \
opencode run "your message" --dir /path/to/project -m anthropic/claude-sonnet-4-6