Extraction Pipeline

After every assistant turn, codexfi runs an extraction pipeline to identify and store key facts from the conversation. This happens in the background and is invisible to the user.

The last 8 messages from the conversation are collected. Each message is formatted as [role] content and concatenated into a single text block. Content longer than the maximum character limit is truncated.

2. LLM extraction

The conversation text is sent to the configured extraction provider with a system prompt instructing it to identify important facts. The LLM returns a JSON array of typed facts:

[
  {
    "memory": "Auth uses JWT stored in httpOnly cookies, not localStorage",
    "type": "architecture"
  },
  {
    "memory": "User prefers bun over npm for all installs",
    "type": "preference"
  }
]

Each fact is an atomic, self-contained piece of knowledge with an assigned memory type.

3. Embedding

Each extracted fact is embedded into a 1024-dimension vector using Voyage AI's voyage-code-3 model. This model is specifically optimized for code and technical content.

The embedding uses input_type: "document" for storage and input_type: "query" for retrieval, which Voyage AI uses to optimize the embedding for each purpose.

4. Deduplication

Before storing, each fact is checked against existing memories using cosine similarity:

Memory category	Dedup threshold	Meaning
General types	0.12	Very close matches are considered duplicates
Structural types	0.25	Wider threshold — structural knowledge evolves

If a duplicate is found, the existing memory is updated (text refreshed, timestamp updated) rather than creating a new entry. This keeps memory count manageable.

5. Storage

New memories (non-duplicates) are inserted into the vector store with:

A UUID identifier
The memory text
The embedding vector
Metadata (type, scope)
Source chunk (the conversation excerpt that produced this fact)
Timestamps (created_at, updated_at)

6. Aging rules

After insertion, type-specific aging rules run:

progress: all older progress memories are deleted (only latest survives)
session-summary: if count exceeds 3, oldest is condensed into a learned-pattern

7. Contradiction detection

The system searches for semantically nearby existing memories (within contradiction distance) and asks the LLM: "Does this new fact supersede any of these?"

If contradictions are found, the old memories are marked as superseded_by the new one. They remain in the database but are excluded from search results.

Extraction modes

The pipeline runs in three modes:

Normal mode

Used after every assistant turn. Extracts atomic facts from the recent conversation with types assigned per fact.

Summary mode

Triggered every N turns (default: 5). Generates a high-level session summary covering what was worked on, decisions made, and progress achieved.

Init mode

Used during auto-initialization when a new project is detected. Reads 28 common project files (README, package.json, Dockerfile, tsconfig, monorepo configs, agent instructions like AGENTS.md and CLAUDE.md, and more) plus the recent 20 git commits. Extracts foundational facts about the project using a specialized init extraction prompt.

After init mode completes, memories are re-fetched immediately so they appear in the Turn 1 [MEMORY] block — the agent has project context from its very first response.

Background enrichment

After the Turn 1 response is delivered, a separate enrichment pass fires in the background. It generates a directory tree of the project structure and extracts additional context from entry points and CI configuration files. This enriches the memory store without delaying the initial response.

Extraction providers

codexfi supports three extraction providers. The configured primary provider is tried first, with automatic fallback through the remaining providers.

Set your provider via "extractionProvider" in codexfi.jsonc — see Configuration for details.

Anthropic (default)

Model: claude-haiku-4-5
Speed: ~14 seconds per session
Notes: Most consistent extraction quality. Reliable JSON output.

xAI

Model: grok-4.3
Speed: ~5 seconds per session
Notes: Fastest and cheapest. Slightly higher variance in extraction quality.

Google

Model: gemini-3-flash-preview
Speed: ~21 seconds per session
Notes: Uses native responseMimeType: "application/json" for guaranteed JSON output.

Fallback behavior

If the primary provider fails after retries, the plugin tries the next provider in order: anthropic → xai → google. If all providers fail, extraction is silently skipped for that turn — the user's session is never interrupted.

Retry strategy

All LLM calls and database operations use exponential backoff with jitter:

LLM extraction: retries with configurable delay and jitter
Database writes: retries for transient write errors
Embedding calls: retries for Voyage AI API errors

Telemetry failures never block extraction or embedding operations.

JSON parsing

LLM responses are parsed with tolerance for common quirks:

Markdown code fences (```json ... ```) are stripped
Plain string arrays are accepted (items default to learned-pattern type)
Wrapped objects ({"memories": [...]}) are unwrapped
Parse failures return an empty array — zero memories for this turn, no error

Privacy

Before extraction, content wrapped in <private>...</private> tags is stripped. Nothing inside private tags is sent to any LLM provider or stored in the database.

Messages that are fully private (entire content inside private tags) are excluded from the extraction snapshot entirely.