Your agents don't read code — they read your codebase's knowledge graph.
Code Brain builds a living knowledge graph from your codebase: tree-sitter AST extraction across five languages, petgraph-backed directed graph, regex-based route and schema detection, and BFS/DFS traversal with token budgeting. Every agent session starts with precise architectural context — not a file dump, but a structured understanding of what your code does, how it's connected, and what's load-bearing.
Code Brain uses tree-sitter — the deterministic parser behind GitHub code search and Neovim — to extract a typed node graph from source files in Rust, TypeScript, TSX, JavaScript, Python, and Go. Functions, methods, classes, structs, modules, import chains, and call edges are extracted directly from the concrete syntax tree. The same file always produces the same graph fragment: no guessing, no heuristics at the parse level.
Every extracted node and edge is stored in a directed knowledge graph backed by petgraph. Node types include Function, Method, Struct, Class, Module, Route, and Schema. Edge types include Imports, Calls, Contains, Implements, InheritsFrom, and SemanticallySimilarTo. The graph is not a summary — it is a traversable topology that agents query directly.
Every edge in the graph carries a confidence class: Extracted (proven by AST, structurally certain), Inferred (heuristic match with a 0.4–0.95 probability score), or Ambiguous (unresolved, flagged as a knowledge gap). Agents can filter by confidence when reasoning about architectural risk. Ambiguous nodes are surfaced in analysis output as explicit gaps.
On top of AST extraction, Code Brain runs framework-aware detectors for routes (Axum, Actix, Express, Fastify, Gin, FastAPI) and schemas (SeaORM, Prisma, SQLAlchemy, Drizzle, TypeORM). Environment variable references are tracked across all languages. These detections become specialised nodes and edges in the same knowledge graph — agents always know what an application exposes and depends on.
When an agent queries the graph ('show me the authentication flow', 'what uses the payments module'), Code Brain seeds from matching nodes and traverses breadth-first or depth-first — stopping precisely when the accumulated context would exceed the configured token budget. The result is a structured block of the right size to fit in an LLM prompt without padding it with irrelevant files. BFS finds the neighbourhood; DFS traces call chains.
Code Brain identifies god nodes: the nodes with the highest total degree (incoming + outgoing edges combined). These are the abstractions everything else depends on — the central router, auth middleware, database pool, base model class. God nodes are surfaced in every context block so agents immediately understand which components are load-bearing, and serve as the fallback seed for open-ended queries.
Given any file or node, Code Brain computes the full blast radius: every file and component that would be affected if it changed. Traversal depth and direction are configurable. The result is a sorted list of affected files with distance from the change origin — essential context for an agent about to modify a shared utility, refactor a model, or rename a core interface.
Every file's extraction result is cached by SHA-256 content hash. Re-scans after small edits take milliseconds, not seconds — only changed files are re-parsed. Cache entries persist across restarts at ~/.eggbert-agentic/codebrain-ast-cache/ and are invalidated by content change, not mtime, so moves and renames do not trigger unnecessary re-extraction.
Code Brain automatically surfaces nodes with at least one Ambiguous incoming edge — places where the graph knows something connects here but cannot resolve what. Common causes: dynamic dispatch, missing dependencies, or external packages that were not scanned. Agents use knowledge gaps as a signal to request additional context or flag risk areas in architectural analysis.
The knowledge graph persists to disk in two forms: a ScanResult JSON (routes, schemas, env vars, dependencies, token stats) and a GraphStore JSON (all nodes and all edges). On startup, the in-memory graph is reconstructed from the store in under 10 ms — no re-scan required. Projects scanned before are ready to answer queries the moment the engine starts.
Code Brain defines a SemanticExtractor trait that can be injected at runtime. When configured, it receives batches of source files and produces additional nodes and edges — concepts, intent labels, SemanticallySimilarTo relationships — that AST alone cannot detect. The default is a no-op stub; a live LLM-backed extractor is opt-in and cost-controlled. AST extraction always runs fast and offline.
Code Brain's primary output is a structured context block injected into agent prompts at the start of a session: project summary, technology stack, key routes and schemas, top god nodes with edge counts, graph statistics, and a token-budgeted BFS result for the current task. All sections are individually token-capped. Agents start every session with the architectural orientation they would otherwise spend turns building.