AI + AST

Code Brain — Knowledge Graph Codebase Intelligence

Your agents don't read code — they read your codebase's knowledge graph.

Code Brain builds a living knowledge graph from your codebase: tree-sitter AST extraction across five languages, petgraph-backed directed graph, regex-based route and schema detection, and BFS/DFS traversal with token budgeting. Every agent session starts with precise architectural context — not a file dump, but a structured understanding of what your code does, how it's connected, and what's load-bearing.

Module ID:MOD-CODE-BRAIN-—-KNOWLEDGE-GRAPH-CODEBASE-INTELLIGENCE-01

Status:Operational / Active

Deploy Env:Native App

Action:Initialise

Technical Capabilities

01.: Tree-Sitter AST Extraction — Five Languages
Code Brain uses tree-sitter — the deterministic parser behind GitHub code search and Neovim — to extract a typed node graph from source files in Rust, TypeScript, TSX, JavaScript, Python, and Go. Functions, methods, classes, structs, modules, import chains, and call edges are extracted directly from the concrete syntax tree. The same file always produces the same graph fragment: no guessing, no heuristics at the parse level.
02.: Knowledge Graph — petgraph DiGraph
Every extracted node and edge is stored in a directed knowledge graph backed by petgraph. Node types include Function, Method, Struct, Class, Module, Route, and Schema. Edge types include Imports, Calls, Contains, Implements, InheritsFrom, and SemanticallySimilarTo. The graph is not a summary — it is a traversable topology that agents query directly.
03.: Edge Confidence System
Every edge in the graph carries a confidence class: Extracted (proven by AST, structurally certain), Inferred (heuristic match with a 0.4–0.95 probability score), or Ambiguous (unresolved, flagged as a knowledge gap). Agents can filter by confidence when reasoning about architectural risk. Ambiguous nodes are surfaced in analysis output as explicit gaps.
04.: Route and Schema Detection
On top of AST extraction, Code Brain runs framework-aware detectors for routes (Axum, Actix, Express, Fastify, Gin, FastAPI) and schemas (SeaORM, Prisma, SQLAlchemy, Drizzle, TypeORM). Environment variable references are tracked across all languages. These detections become specialised nodes and edges in the same knowledge graph — agents always know what an application exposes and depends on.
05.: BFS/DFS Context Queries with Token Budgeting
When an agent queries the graph ('show me the authentication flow', 'what uses the payments module'), Code Brain seeds from matching nodes and traverses breadth-first or depth-first — stopping precisely when the accumulated context would exceed the configured token budget. The result is a structured block of the right size to fit in an LLM prompt without padding it with irrelevant files. BFS finds the neighbourhood; DFS traces call chains.
06.: God Nodes — Architectural Hot Spots
Code Brain identifies god nodes: the nodes with the highest total degree (incoming + outgoing edges combined). These are the abstractions everything else depends on — the central router, auth middleware, database pool, base model class. God nodes are surfaced in every context block so agents immediately understand which components are load-bearing, and serve as the fallback seed for open-ended queries.
07.: Blast Radius Analysis
Given any file or node, Code Brain computes the full blast radius: every file and component that would be affected if it changed. Traversal depth and direction are configurable. The result is a sorted list of affected files with distance from the change origin — essential context for an agent about to modify a shared utility, refactor a model, or rename a core interface.
08.: SHA-256 Extraction Cache
Every file's extraction result is cached by SHA-256 content hash. Re-scans after small edits take milliseconds, not seconds — only changed files are re-parsed. Cache entries persist across restarts at ~/.eggbert-agentic/codebrain-ast-cache/ and are invalidated by content change, not mtime, so moves and renames do not trigger unnecessary re-extraction.
09.: Knowledge Gaps Detection
Code Brain automatically surfaces nodes with at least one Ambiguous incoming edge — places where the graph knows something connects here but cannot resolve what. Common causes: dynamic dispatch, missing dependencies, or external packages that were not scanned. Agents use knowledge gaps as a signal to request additional context or flag risk areas in architectural analysis.
10.: Graph Persistence and Warm Start
The knowledge graph persists to disk in two forms: a ScanResult JSON (routes, schemas, env vars, dependencies, token stats) and a GraphStore JSON (all nodes and all edges). On startup, the in-memory graph is reconstructed from the store in under 10 ms — no re-scan required. Projects scanned before are ready to answer queries the moment the engine starts.
11.: Semantic Extraction — Pluggable LLM Layer
Code Brain defines a SemanticExtractor trait that can be injected at runtime. When configured, it receives batches of source files and produces additional nodes and edges — concepts, intent labels, SemanticallySimilarTo relationships — that AST alone cannot detect. The default is a no-op stub; a live LLM-backed extractor is opt-in and cost-controlled. AST extraction always runs fast and offline.
12.: Context Block Integration
Code Brain's primary output is a structured context block injected into agent prompts at the start of a session: project summary, technology stack, key routes and schemas, top god nodes with edge counts, graph statistics, and a token-budgeted BFS result for the current task. All sections are individually token-capped. Agents start every session with the architectural orientation they would otherwise spend turns building.

Integration Matrix

Agent Hub AI Assistant AI Thinker Vault Integrations Hub

Code Brain — Knowledge Graph Codebase Intelligence

Technical Capabilities

Tree-Sitter AST Extraction — Five Languages

Knowledge Graph — petgraph DiGraph

Edge Confidence System

Route and Schema Detection

BFS/DFS Context Queries with Token Budgeting

God Nodes — Architectural Hot Spots

Blast Radius Analysis

SHA-256 Extraction Cache

Knowledge Gaps Detection

Graph Persistence and Warm Start

Semantic Extraction — Pluggable LLM Layer

Context Block Integration

Integration Matrix