HBDS: How AI Agents Search Typed Documents
HBDS (Hierarchical Budget-Constrained Descent Search) is a search algorithm that navigates typed knowledge hierarchies under strict token budgets. It works in four phases: ORIENT (classify the query), DESCEND (beam search through the tree), SYNTHESIZE (combine results), and WRITE BACK (persist new knowledge). HBDS requires typed documents — it cannot operate on flat text. This page explains the algorithm, walks through a concrete example, and shows why document format determines search capability.
Every AI agent has a context window — a fixed number of tokens it can work with. The knowledge it needs to do its job is usually much larger than that window. HBDS is the algorithm that decides what to read within that budget. It navigates a tree of typed documents, scoring relevance at each level, checking costs before reading, and falling back to summaries when the budget is tight.
The Problem HBDS Solves
An AI agent working on your codebase has access to hundreds of documents — architecture docs, API specs, decision records, meeting notes, guides. Its context window might be 200,000 tokens. The total documentation might be 2,000,000 tokens.
The agent cannot read everything. It must choose. Current approaches:
| Approach | How It Works | The Problem |
|---|---|---|
| RAG | Chunk docs, embed, retrieve top-k similar chunks | Flat — no hierarchy, arbitrary chunk boundaries, no budget control |
| Agentic tool use | LLM decides what to grep/read based on reasoning | No formal budget tracking, no guaranteed coverage |
| Graph RAG | Extract entities from flat text, build knowledge graph, traverse | Structure is inferred post-hoc — inference errors propagate |
| RAPTOR | Cluster chunks, summarize into tree, search top-down | Tree is auto-generated from flat text, no typed blocks |
All of these either operate on flat structure or infer structure from flat text at retrieval time. None use structure declared in the document format itself.
HBDS does. That's the difference.
The Four Phases
HBDS works in four sequential phases. Each phase consumes tokens from a monotonically decreasing budget. The context window IS the execution budget.
ORIENT ──→ DESCEND ──→ SYNTHESIZE ──→ WRITE BACK
200-800 500-5000 1000-3000 200-1000 tokensPhase 1: ORIENT
Goal: Classify the query and select where to start searching.
The agent reads a lightweight root index — a table of contents for the entire knowledge tree. One LLM call classifies the query and picks the most relevant top-level domains.
Query: "How does authentication work in this app?"
Root Index:
├── project/ (backend, frontend, infra — 450K tokens)
├── general/ (TypeScript, AWS, patterns — 800K tokens)
└── agent-memories/ (session logs — 200K tokens)
Classification: project_local
Selected domain: project/backend/
Confidence: 0.92
Cost: 600 tokensIf the root index contains a keyword match with >0.9 confidence, HBDS skips the entire descent and reads the matching leaf directly. This handles queries like "What is the Stripe price ID?" in ~1 second.
Phase 2: DESCEND
Goal: Navigate the tree to find relevant leaf nodes, within budget.
This is the core of the algorithm — an iterative beam search with budget awareness. At each level of the tree:
Query: "How does authentication work?"
Budget remaining: 28,000 tokens
Level 0 — project/ children:
backend/ (auth, billing, data) score: 0.95 ✓ beam
frontend/ (components, state) score: 0.35 ✗ pruned
infra/ (amplify, deploy) score: 0.20 ✗ pruned
Level 1 — backend/ children:
auth/ (4,200 tokens) score: 0.98 ✓ beam
billing/ (3,100 tokens) score: 0.12 ✗ pruned
data/ (5,800 tokens) score: 0.30 ✗ pruned
Level 2 — auth/ is a leaf node
Budget check: 4,200 ≤ 25,400 ✓
→ Read full content
→ Budget remaining: 21,200
Result: auth/ content (relevance: 0.98, 4,200 tokens)This is the step that flat text makes impossible. Before HBDS reads a leaf
node, it checks can_afford(node) — comparing the node's declared token
count against the remaining budget.
In flat text, there is no declared token count. The agent must read the document to find out how large it is. Every read is a blind commitment. In a typed document with P4 (pre-computed token costs), the budget check is a single integer comparison.
Graceful Degradation
When the budget is insufficient for full content, HBDS doesn't skip the node
entirely. It falls back to the node's ::summary block — a compact,
high-signal representation that costs a fraction of the full content.
Node: infrastructure/deployment-guide/
Full content: 12,000 tokens
Summary: 180 tokens
Budget remaining: 3,500 tokens
Decision: 12,000 > 3,500 → cannot afford full content
Fallback: read summary (180 tokens)
Result stored as PARTIAL (relevance score discounted 50%)This graceful degradation is only possible because typed documents distinguish
::summary from full content. In flat text, there is no summary layer.
Phase 3: SYNTHESIZE
Goal: Combine all retrieved knowledge into a coherent context injection.
The synthesis phase takes every result from DESCEND — full content and partial summaries — sorts them by relevance, and packs them into the remaining budget. A single LLM call produces:
| Output | Purpose |
|---|---|
| Answer | Coherent synthesis of retrieved knowledge |
| Confidence | 0-1 rating of how well the sources answer the query |
| Gaps | Aspects of the query the sources don't cover |
| New insights | Deductions not explicitly stated in any source |
| Sources used | Node IDs that contributed (for citation) |
Partial results (summaries) are included but marked as [PARTIAL], so the
synthesis knows to weight them lower and flag any conclusions that depend
solely on partial data.
Phase 4: WRITE BACK
Goal: Persist new knowledge generated during the search.
The synthesis phase often produces new insights — deductions that aren't explicitly stated in any source but emerge from combining multiple sources. HBDS writes these back to the knowledge tree.
New insight: "The auth system depends on a post-confirmation Lambda
trigger that creates the user's initial workspace. This dependency is
not documented in either the auth or workspace docs."
Placement: project/backend/auth/ (append)
Trust score: 0.6 (agent-generated, lower than human-authored 0.9)
Source: "agent-write-back"Over time, the knowledge tree self-improves through use. Every query that produces new insight writes it back, with trust scoring and version tracking. The tree reorganizes around actual query patterns.
This is what flat text structurally cannot do. A flat text corpus cannot self-organize because there is no structure to organize into. Write-back requires:
- P1 (block types) to know what kind of knowledge to write
- P2 (typed refs) to know where in the hierarchy to place it
- P4 (token costs) to update after the new content is added
The Budget System
The entire context window is divided into zones. Each zone has a guaranteed minimum and a flexible maximum.
Total: 200,000 tokens
├── System prompt [FIXED] 3,000
├── Task description [FIXED] 2,000
├── Search budget [FLEXIBLE] 29,000 (15%)
│ ├── ORIENT reserve 1,000
│ ├── DESCEND reserve 25,000
│ └── SYNTHESIZE reserve 3,000
├── Task work [FLEXIBLE] 155,000 (80%)
├── Write-back [FLEXIBLE] 5,800 (3%)
└── Output reserve [FIXED] 4,000If search completes early (direct hit, shallow tree), unused search budget transfers to the task work zone. Search never steals from the task zone — if the search budget is exhausted, the agent works with whatever it found.
struct BudgetTracker {
total: usize,
consumed: usize,
reserved: usize,
}
impl BudgetTracker {
fn available(&self) -> usize {
self.total - self.consumed - self.reserved
}
fn can_afford(&self, node: &TreeNode) -> bool {
node.content_tokens <= self.available()
}
fn can_afford_subtree(&self, node: &TreeNode) -> bool {
node.subtree_token_estimate <= self.available()
}
fn consume(&mut self, n: usize, label: &str) {
assert!(n <= self.available(), "Budget exceeded");
self.consumed += n;
}
}Budget Reporting
Every HBDS search produces a full accounting report:
{
"totalBudget": 29000,
"consumed": {
"orient": { "readIndex": 180, "classify": 420, "total": 600 },
"descend": {
"depth0": { "score": 380, "readLeaves": 0, "expand": 150, "total": 530 },
"depth1": { "score": 350, "readLeaves": 4200, "expand": 0, "total": 4550 },
"total": 5080
},
"synthesize": { "total": 1800 },
"writeback": { "total": 450 },
"searchTotal": 7930
},
"unused": 21070,
"searchEfficiency": {
"nodesScored": 8,
"nodesRead": 1,
"tokensPerUsefulResult": 7930,
"searchFractionActual": 0.04
}
}In this example, the search used only 4% of the total budget to find the right knowledge. The remaining 96% is available for the actual task. Compare this to loading the entire documentation (2M tokens — doesn't even fit in the context window).
The Knowledge Tree Schema
Every node in the HBDS knowledge tree has this structure:
{
"id": "proj.backend.auth",
"path": "/project/backend/auth",
"parentId": "proj.backend",
"type": "leaf",
"summary": "Authentication setup using OPAQUE-KE with post-confirmation triggers.",
"keywords": ["auth", "opaque", "login", "jwt"],
"relevanceHints": [
"How does authentication work?",
"How do users sign up?"
],
"contentTokens": 4200,
"metadata": {
"created": "2026-02-08T10:00:00Z",
"lastUpdated": "2026-02-22T14:30:00Z",
"version": 3,
"trustScore": 0.92,
"accessCount": 14,
"source": "manual"
}
}Key design decisions:
| Field | Why It Exists | HBDS Phase That Uses It |
|---|---|---|
| `summary` | Cheap proxy for full content; enables graceful degradation | DESCEND (fallback), SYNTHESIZE |
| `keywords` | Fast direct-hit matching without LLM scoring | ORIENT (fast path) |
| `relevanceHints` | Example queries this node answers; improves LLM scoring | DESCEND (scoring prompt) |
| `contentTokens` | Pre-read budget verification | DESCEND (can_afford check) |
| `trustScore` | Quality signal — decays over time, boosted by verification | DESCEND (ranking), SYNTHESIZE |
| `accessCount` | Popularity signal for tree optimization | WRITE BACK (reorganization) |
Trust Scores
Trust is not static. It decays over time and is boosted by verification events.
Initial trust:
Human-authored: 0.9
Agent-generated: 0.6
Decay:
trustScore(t) = trustScore(last) * 0.997 ^ days_since_update
Half-life: ~231 days
Boosts:
Agent verification (reads and confirms): +0.05
Human verification: +0.15
Cross-referenced by another node: +0.03
Floor: 0.10 (below this → flagged for review or archived)During search, nodes with trustScore < 0.3 receive a scoring penalty:
effectiveScore = rawScore * trustScore. Nodes past their TTL are excluded
entirely.
Configuration
HBDS is tunable. Every parameter has a sensible default, but the orchestrator can adjust them per-task.
| Parameter | Default | What It Controls |
|---|---|---|
| `beamWidth` | 3 | Branches explored per descent level |
| `maxDepth` | 5 | Maximum tree depth |
| `pruneThreshold` | 0.2 | Minimum score to keep a candidate |
| `depthDecayFactor` | 0.85 | Score penalty per descent level |
| `searchFraction` | 0.15 | Fraction of total budget for search |
| `minimumUsefulBudget` | 2000 | Stop search below this token count |
| `enableWriteBack` | true | Allow knowledge tree updates |
| `enableDirectHit` | true | Allow fast-path keyword matching |
For research-heavy tasks (architecture analysis, dependency tracing), increase
searchFraction to 0.40. For execution-heavy tasks (simple code edits where
context is in the prompt), reduce to 0.05.
Why Flat Text Cannot Support HBDS
This is the core claim. Each HBDS phase depends on a property of the document format. Flat text lacks all four.
| HBDS Capability | Required Format Property | Available in Markdown? | Available in SurfDoc? |
|---|---|---|---|
| Classify query against domain index | P2: Typed cross-references (hierarchy) | No — headings are visual | Yes — frontmatter + `::related` |
| Score candidate relevance | P1: Declared block types | No — blocks are untyped | Yes — `::decision`, `::data`, etc. |
| Check budget before reading | P4: Pre-computed token costs | No — must read to know size | Yes — `contentTokens` in schema |
| Fall back to summaries | P1 + P3: Types + priority | No — no summary layer | Yes — `::summary` blocks |
| Type-aware synthesis | P1: Block types | No — undifferentiated prose | Yes — route by type |
| Place write-back knowledge | P2: Typed references | No — no insertion semantics | Yes — parent-child hierarchy |
You cannot retrofit these properties onto Markdown without making it something other than Markdown. The properties are not presentation — they are structure. The format IS the architecture.
Storage Options
The knowledge tree can be stored three ways, depending on scale:
| Option | Scale | Advantage | Disadvantage |
|---|---|---|---|
| File-based (Markdown tree) | < 1,000 nodes | Human-readable, Git-friendly, no infrastructure | No ACID transactions, slow keyword search |
| Hybrid (files + SQLite index) | 1,000-10,000 nodes | Best of both — human-readable content, fast queries | Two systems to keep in sync |
| Database (SQLite / PostgreSQL) | 10,000+ nodes | ACID, fast queries, scalable | Not human-readable, requires infrastructure |
Start with file-based (Phase 1). It works immediately with existing project structures. Migrate to hybrid when keyword search gets slow. Use database only for multi-user, team-shared knowledge stores.
Provider Agnosticism
The algorithm is universal. The LLM provider is swappable.
| Component | Universal? | Provider-Specific? |
|---|---|---|
| Knowledge tree schema | Yes | No |
| HBDS algorithm logic | Yes | No |
| Budget tracking | Yes | No |
| Token counting | Partially | Tokenizer varies per model |
| LLM calls (ORIENT, SCORE, SYNTHESIZE) | Prompt format is universal | API call format, auth |
| Context window size | No | Yes (200K Claude, 128K GPT-4, 1M Gemini) |
HBDS auto-adapts to context window size:
Claude (200K): beamWidth=3, searchFraction=0.15
GPT-4 (128K): beamWidth=3, searchFraction=0.15
Gemini (1M): beamWidth=4, searchFraction=0.10
Small models (<32K): summary-only mode (never reads full leaves)Agent Composability
HBDS agents compose into chains. The output of one agent feeds the input of the next — but through summaries and references, not raw content.
[Planner] ──→ [Backend Dev] ──→ [Tester]
↓ ↑
[Frontend Dev] ────────┘
Step 0 (Planner): Plan the work, allocate budgets 100K tokens
Step 1 (Backend): Build auth Lambda, JWT validation 250K tokens ┐ parallel
Step 2 (Frontend): Build login UI, session management 250K tokens ┘
Step 3 (Tester): Write tests for both 200K tokensAgent 1's full output is NOT passed to Agent 2. Instead, the orchestrator passes a compressed summary + artifact references. Agent 2 can use HBDS to read the detailed artifacts if it needs them. This prevents context window bloat in chains.
Implementation Phases
| Phase | Scope | What's Included | What's Not |
|---|---|---|---|
| 1: MVP | Single user, file-based, Claude-only | Knowledge tree + ORIENT + DESCEND + SYNTHESIZE | Write-back, chaining, multi-provider |
| 2: Full | Single user, hybrid storage, multi-provider | All 4 phases, agent lifecycle, trust scores, Claude + OpenAI | Multi-user, parallel execution |
| 3: Teams | Multi-user, database-backed | Concurrent writes, parallel agents, namespaces, admin tools | Federation, marketplace |
| 4: Platform | Ecosystem | Public trees, agent marketplace, federation, analytics | — |
Summary
HBDS is a search algorithm designed for a world where:
- Context windows are finite and always will be
- Knowledge worth querying is effectively infinite
- Flat text provides no structure for algorithms to navigate
- Typed documents declare the structure that algorithms need
The algorithm navigates typed knowledge hierarchies under strict token budgets, uses beam search with pre-read budget verification, degrades gracefully to summaries, and self-improves through write-back. It is impossible on flat text and trivial on typed documents.
The format is the architecture. The algorithm is the proof.