Test Suite

codexfi ships with a three-tier test suite that verifies the memory system works end-to-end — from vector storage through to what the agent actually says in a conversation.

What is verified

The 13 E2E scenarios test the behaviors you rely on as a user:

Scenario	What it guarantees for you
Cross-Session Memory Continuity	Facts you mention in one session are available in the next
README-Based Project Seeding	Opening a project with a README auto-populates memory without any action from you
Transcript Noise Guard	Memory contains distilled facts, not raw conversation transcripts
Project-Brief Always Present	Even without a README, the agent builds project context from conversation
Memory Aging	Old "current status" entries are replaced — you always get the latest state, not history
Existing Codebase Auto-Init	Opening an existing codebase populates memory from project files automatically
Enumeration Hybrid Retrieval	"List all my preferences" queries return facts across all sessions, not just the recent ones
Cross-Synthesis	Queries about multiple projects surface facts from both namespaces
Memory Under Load	Early session facts are still recalled when many memories are stored
Knowledge Update	After you change tools or approaches, the agent uses the new information, not the old
System Prompt Injection	The `[MEMORY]` block is always present in the system prompt — never missed
Multi-Turn Per-Turn Refresh	As you switch topics mid-session, relevant memories surface automatically
Auto-Init Turn 1 Visibility	Auto-init memories are visible on Turn 1 and background enrichment fires after the first response

Latest results

Run against feat/pure-ts-vector-store branch (SQLite WAL store) — 2026-04-06.

Unit:        139 pass, 0 fail   (~1s)
Integration:  31 pass, 0 fail   (~3s)
Stress:        3 pass, 0 fail   (~1s)
E2E:       12/13 pass           (~10min)

FAIL  01  Cross-Session Memory Continuity       17.5s
PASS  02  README-Based Project-Brief Seeding    14.9s
PASS  03  Transcript Noise Guard                15.0s
PASS  04  Project Brief Always Present          19.5s
PASS  05  Memory Aging                          50.3s
PASS  06  Existing Codebase Auto-Init           68.1s
PASS  07  Enumeration Hybrid Retrieval          38.2s
PASS  08  Cross-Synthesis (isWideSynthesis)     54.9s
PASS  09  maxMemories=20 Under Load            176.9s
PASS  10  Knowledge Update / Superseded         67.3s
PASS  11  System Prompt Memory Injection        23.2s
PASS  12  Multi-Turn Per-Turn Refresh           59.2s
PASS  13  Auto-Init Turn 1 Visibility           24.0s

Scenario 01 failure is extraction variance — the agent recalled all tech facts correctly but said "a CLI task management tool" instead of the project name "taskflow". Previous solo run passed. Not a store bug.

For contributors

The sections below cover the test architecture and how to run each tier locally.

Test tiers

Tier	Tests	What runs	Speed
Unit	139	Cosine math, CRUD, filters, serialisation, dedup, aging, extraction parsing	~1s
Integration	31	Real SQLite store + real Voyage AI embedder against a test database	~3s
Stress	3	20 real OS processes hitting the same SQLite file — concurrent writes, reads, and mixed	~1s
E2E	13	Real `opencode` agent sessions, real memory store, 13 autonomous scenarios	~10 min

The stress tests validate multi-agent safety — the scenario where multiple OpenCode agents read and write the same store.db simultaneously. Each test spawns real child processes (not async within one process) to match the actual deployment pattern.

Running unit and integration tests

# From the repo root
bun run test                    # unit + integration together
bun test src/unit/              # unit only
bun test src/integration/       # integration only

Running E2E tests

# From the repo root
bun run test:e2e                     # all 13 scenarios
bun run test:e2e:scenario 07         # single scenario
bun run test:e2e:scenario 07,08,12   # subset

Or directly from the testing/ directory:

cd testing
bun install       # first time only
bun run test      # unit + integration tests
bun run test:e2e  # all 13 scenarios
bun run test:scenario 07

E2E prerequisites

The E2E suite spawns real opencode run and opencode serve processes. Four things must be in place:

Plugin built — cd plugin && bun run build
OpenCode CLI installed — bun install -g opencode-ai (v1.2.10+)

Plugin registered in ~/.config/opencode/opencode.json:

{
  "plugin": ["file:///absolute/path/to/codexfi/plugin/dist/index.js"]
}

API keys configured in ~/.codexfi/codexfi.jsonc:
```
{
  "voyageApiKey": "pa-...",
  "anthropicApiKey": "sk-ant-..."
}
```
The plugin reads keys exclusively from codexfi.jsonc — setting VOYAGE_API_KEY or ANTHROPIC_API_KEY as environment variables is not sufficient. Run bunx codexfi install to create this file interactively, or create it manually. An AI provider key for the OpenCode agent sessions is also required — set this via ANTHROPIC_API_KEY (or equivalent) in your shell environment for OpenCode's own model calls.

Full E2E scenario reference

Each scenario runs in an isolated temporary directory. All memories it creates are deleted from the store automatically after it completes.

#	Name	What it tests internally
01	Cross-Session Memory Continuity	Auto-save fires after session end; session 2 recalls facts from session 1
02	README-Based Project-Brief Seeding	`triggerSilentAutoInit` reads README on first session; `project-brief` memory is created and recalled in session 2
03	Transcript Noise Guard	Saved memories contain no raw `[user]`/`[assistant]` transcript lines
04	Project-Brief Always Present	Memories accumulate from conversation even without README; session 2 recalls project facts
05	Memory Aging	Backend replaces older `progress` memories with newest; only 1 survives across 3 sessions
06	Existing Codebase Auto-Init	`triggerSilentAutoInit` reads real project files (`package.json`, `tsconfig`, `src/`) on first open
07	Enumeration Hybrid Retrieval	`types[]` param fires for "list all preferences" queries; answer covers preferences across all sessions
08	Cross-Synthesis (`isWideSynthesis`)	"across both projects" heuristic fires; answer synthesises facts from two project namespaces
09	`maxMemories=20` Under Load	With >10 memories stored, facts from early sessions still recalled — confirms K=20 retrieval depth
10	Knowledge Update / Superseded	After ORM migration, agent answers with Tortoise (new), not SQLAlchemy (stale)
11	System Prompt Memory Injection	`[MEMORY]` block injected via `system.transform` into the system prompt — not as a synthetic message part
12	Multi-Turn Per-Turn Refresh	6-turn conversation via `opencode serve`; per-turn semantic refresh surfaces topic-relevant memories on topic switches
13	Auto-Init Turn 1 Visibility + Enrichment	Auto-init uses init mode with 28 project files + git log; re-fetches memories for Turn 1 visibility; background enrichment fires after first response

Known issue: OpenCode Desktop app interference

If the OpenCode Desktop app is running, it sets OPENCODE_SERVER_PASSWORD, OPENCODE_SERVER_USERNAME, and OPENCODE_CLIENT in your shell environment. The opencode run CLI inherits these and its internal server then requires Basic Auth — causing every CLI session to fail silently.

The test harness handles this automatically via cleanEnv() in testing/src/opencode.ts. You do not need to close the Desktop app to run the E2E suite.

If you run opencode run manually from a terminal where the Desktop app is active:

env -u OPENCODE_SERVER_PASSWORD -u OPENCODE_SERVER_USERNAME -u OPENCODE_CLIENT \
  opencode run "your message" --dir /path/to/project -m anthropic/claude-sonnet-4-6

On this page