HBDS: How AI Agents Search Typed Documents

Summary

HBDS (Hierarchical Budget-Constrained Descent Search) is a search algorithm that navigates typed knowledge hierarchies under strict token budgets. It works in four phases: ORIENT (classify the query), DESCEND (beam search through the tree), SYNTHESIZE (combine results), and WRITE BACK (persist new knowledge). HBDS requires typed documents — it cannot operate on flat text. This page explains the algorithm, walks through a concrete example, and shows why document format determines search capability.

What is HBDS for?

Every AI agent has a context window — a fixed number of tokens it can work with. The knowledge it needs to do its job is usually much larger than that window. HBDS is the algorithm that decides what to read within that budget. It navigates a tree of typed documents, scoring relevance at each level, checking costs before reading, and falling back to summaries when the budget is tight.

The Problem HBDS Solves

An AI agent working on your codebase has access to hundreds of documents — architecture docs, API specs, decision records, meeting notes, guides. Its context window might be 200,000 tokens. The total documentation might be 2,000,000 tokens.

The agent cannot read everything. It must choose. Current approaches:

Approach	How It Works	The Problem
RAG	Chunk docs, embed, retrieve top-k similar chunks	Flat — no hierarchy, arbitrary chunk boundaries, no budget control
Agentic tool use	LLM decides what to grep/read based on reasoning	No formal budget tracking, no guaranteed coverage
Graph RAG	Extract entities from flat text, build knowledge graph, traverse	Structure is inferred post-hoc — inference errors propagate
RAPTOR	Cluster chunks, summarize into tree, search top-down	Tree is auto-generated from flat text, no typed blocks

All of these either operate on flat structure or infer structure from flat text at retrieval time. None use structure declared in the document format itself.

HBDS does. That's the difference.

The Four Phases

HBDS works in four sequential phases. Each phase consumes tokens from a monotonically decreasing budget. The context window IS the execution budget.

TEXT

ORIENT ──→ DESCEND ──→ SYNTHESIZE ──→ WRITE BACK
 200-800    500-5000    1000-3000      200-1000 tokens

Phase 1: ORIENT

Goal: Classify the query and select where to start searching.

The agent reads a lightweight root index — a table of contents for the entire knowledge tree. One LLM call classifies the query and picks the most relevant top-level domains.

TEXT

Query: "How does authentication work in this app?"

Root Index:
  ├── project/          (backend, frontend, infra — 450K tokens)
  ├── general/          (TypeScript, AWS, patterns — 800K tokens)
  └── agent-memories/   (session logs — 200K tokens)

Classification: project_local
Selected domain: project/backend/
Confidence: 0.92
Cost: 600 tokens

Fast path

If the root index contains a keyword match with >0.9 confidence, HBDS skips the entire descent and reads the matching leaf directly. This handles queries like "What is the Stripe price ID?" in ~1 second.

Phase 2: DESCEND

Goal: Navigate the tree to find relevant leaf nodes, within budget.

This is the core of the algorithm — an iterative beam search with budget awareness. At each level of the tree:

TEXT

Query: "How does authentication work?"
Budget remaining: 28,000 tokens

Level 0 — project/ children:
  backend/     (auth, billing, data)     score: 0.95  ✓ beam
  frontend/    (components, state)       score: 0.35  ✗ pruned
  infra/       (amplify, deploy)         score: 0.20  ✗ pruned

Level 1 — backend/ children:
  auth/        (4,200 tokens)            score: 0.98  ✓ beam
  billing/     (3,100 tokens)            score: 0.12  ✗ pruned
  data/        (5,800 tokens)            score: 0.30  ✗ pruned

Level 2 — auth/ is a leaf node
  Budget check: 4,200 ≤ 25,400 ✓
  → Read full content
  → Budget remaining: 21,200

Result: auth/ content (relevance: 0.98, 4,200 tokens)

Pre-read budget verification

This is the step that flat text makes impossible. Before HBDS reads a leaf node, it checks can_afford(node) — comparing the node's declared token count against the remaining budget.

In flat text, there is no declared token count. The agent must read the document to find out how large it is. Every read is a blind commitment. In a typed document with P4 (pre-computed token costs), the budget check is a single integer comparison.

Graceful Degradation

When the budget is insufficient for full content, HBDS doesn't skip the node entirely. It falls back to the node's ::summary block — a compact, high-signal representation that costs a fraction of the full content.

TEXT

Node: infrastructure/deployment-guide/
Full content: 12,000 tokens
Summary: 180 tokens
Budget remaining: 3,500 tokens

Decision: 12,000 > 3,500 → cannot afford full content
Fallback: read summary (180 tokens)
Result stored as PARTIAL (relevance score discounted 50%)

This graceful degradation is only possible because typed documents distinguish ::summary from full content. In flat text, there is no summary layer.

Phase 3: SYNTHESIZE

Goal: Combine all retrieved knowledge into a coherent context injection.

The synthesis phase takes every result from DESCEND — full content and partial summaries — sorts them by relevance, and packs them into the remaining budget. A single LLM call produces:

Output	Purpose
Answer	Coherent synthesis of retrieved knowledge
Confidence	0-1 rating of how well the sources answer the query
Gaps	Aspects of the query the sources don't cover
New insights	Deductions not explicitly stated in any source
Sources used	Node IDs that contributed (for citation)

Partial results (summaries) are included but marked as [PARTIAL], so the synthesis knows to weight them lower and flag any conclusions that depend solely on partial data.

Phase 4: WRITE BACK

Goal: Persist new knowledge generated during the search.

The synthesis phase often produces new insights — deductions that aren't explicitly stated in any source but emerge from combining multiple sources. HBDS writes these back to the knowledge tree.

TEXT

New insight: "The auth system depends on a post-confirmation Lambda
trigger that creates the user's initial workspace. This dependency is
not documented in either the auth or workspace docs."

Placement: project/backend/auth/ (append)
Trust score: 0.6 (agent-generated, lower than human-authored 0.9)
Source: "agent-write-back"

Over time, the knowledge tree self-improves through use. Every query that produces new insight writes it back, with trust scoring and version tracking. The tree reorganizes around actual query patterns.

Self-organizing knowledge

This is what flat text structurally cannot do. A flat text corpus cannot self-organize because there is no structure to organize into. Write-back requires:

P1 (block types) to know what kind of knowledge to write
P2 (typed refs) to know where in the hierarchy to place it
P4 (token costs) to update after the new content is added

The Budget System

The entire context window is divided into zones. Each zone has a guaranteed minimum and a flexible maximum.

TEXT

Total: 200,000 tokens
├── System prompt         [FIXED]       3,000
├── Task description      [FIXED]       2,000
├── Search budget         [FLEXIBLE]    29,000  (15%)
│   ├── ORIENT reserve                  1,000
│   ├── DESCEND reserve                 25,000
│   └── SYNTHESIZE reserve              3,000
├── Task work             [FLEXIBLE]    155,000 (80%)
├── Write-back            [FLEXIBLE]    5,800   (3%)
└── Output reserve        [FIXED]       4,000

If search completes early (direct hit, shallow tree), unused search budget transfers to the task work zone. Search never steals from the task zone — if the search budget is exhausted, the agent works with whatever it found.

RUST

struct BudgetTracker {
    total: usize,
    consumed: usize,
    reserved: usize,
}

impl BudgetTracker {
    fn available(&self) -> usize {
        self.total - self.consumed - self.reserved
    }

    fn can_afford(&self, node: &TreeNode) -> bool {
        node.content_tokens <= self.available()
    }

    fn can_afford_subtree(&self, node: &TreeNode) -> bool {
        node.subtree_token_estimate <= self.available()
    }

    fn consume(&mut self, n: usize, label: &str) {
        assert!(n <= self.available(), "Budget exceeded");
        self.consumed += n;
    }
}

Budget Reporting

Every HBDS search produces a full accounting report:

JSON

{
  "totalBudget": 29000,
  "consumed": {
    "orient": { "readIndex": 180, "classify": 420, "total": 600 },
    "descend": {
      "depth0": { "score": 380, "readLeaves": 0, "expand": 150, "total": 530 },
      "depth1": { "score": 350, "readLeaves": 4200, "expand": 0, "total": 4550 },
      "total": 5080
    },
    "synthesize": { "total": 1800 },
    "writeback": { "total": 450 },
    "searchTotal": 7930
  },
  "unused": 21070,
  "searchEfficiency": {
    "nodesScored": 8,
    "nodesRead": 1,
    "tokensPerUsefulResult": 7930,
    "searchFractionActual": 0.04
  }
}

In this example, the search used only 4% of the total budget to find the right knowledge. The remaining 96% is available for the actual task. Compare this to loading the entire documentation (2M tokens — doesn't even fit in the context window).

The Knowledge Tree Schema

Every node in the HBDS knowledge tree has this structure:

JSON

{
  "id": "proj.backend.auth",
  "path": "/project/backend/auth",
  "parentId": "proj.backend",
  "type": "leaf",
  "summary": "Authentication setup using OPAQUE-KE with post-confirmation triggers.",
  "keywords": ["auth", "opaque", "login", "jwt"],
  "relevanceHints": [
    "How does authentication work?",
    "How do users sign up?"
  ],
  "contentTokens": 4200,
  "metadata": {
    "created": "2026-02-08T10:00:00Z",
    "lastUpdated": "2026-02-22T14:30:00Z",
    "version": 3,
    "trustScore": 0.92,
    "accessCount": 14,
    "source": "manual"
  }
}

Key design decisions:

Field	Why It Exists	HBDS Phase That Uses It
`summary`	Cheap proxy for full content; enables graceful degradation	DESCEND (fallback), SYNTHESIZE
`keywords`	Fast direct-hit matching without LLM scoring	ORIENT (fast path)
`relevanceHints`	Example queries this node answers; improves LLM scoring	DESCEND (scoring prompt)
`contentTokens`	Pre-read budget verification	DESCEND (can_afford check)
`trustScore`	Quality signal — decays over time, boosted by verification	DESCEND (ranking), SYNTHESIZE
`accessCount`	Popularity signal for tree optimization	WRITE BACK (reorganization)

Trust Scores

Trust is not static. It decays over time and is boosted by verification events.

TEXT

Initial trust:
  Human-authored:  0.9
  Agent-generated: 0.6

Decay:
  trustScore(t) = trustScore(last) * 0.997 ^ days_since_update
  Half-life: ~231 days

Boosts:
  Agent verification (reads and confirms): +0.05
  Human verification:                      +0.15
  Cross-referenced by another node:        +0.03

Floor: 0.10 (below this → flagged for review or archived)

During search, nodes with trustScore < 0.3 receive a scoring penalty: effectiveScore = rawScore * trustScore. Nodes past their TTL are excluded entirely.

Configuration

HBDS is tunable. Every parameter has a sensible default, but the orchestrator can adjust them per-task.

Parameter	Default	What It Controls
`beamWidth`	3	Branches explored per descent level
`maxDepth`	5	Maximum tree depth
`pruneThreshold`	0.2	Minimum score to keep a candidate
`depthDecayFactor`	0.85	Score penalty per descent level
`searchFraction`	0.15	Fraction of total budget for search
`minimumUsefulBudget`	2000	Stop search below this token count
`enableWriteBack`	true	Allow knowledge tree updates
`enableDirectHit`	true	Allow fast-path keyword matching

For research-heavy tasks (architecture analysis, dependency tracing), increase searchFraction to 0.40. For execution-heavy tasks (simple code edits where context is in the prompt), reduce to 0.05.

Why Flat Text Cannot Support HBDS

This is the core claim. Each HBDS phase depends on a property of the document format. Flat text lacks all four.

HBDS Capability	Required Format Property	Available in Markdown?	Available in SurfDoc?
Classify query against domain index	P2: Typed cross-references (hierarchy)	No — headings are visual	Yes — frontmatter + `::related`
Score candidate relevance	P1: Declared block types	No — blocks are untyped	Yes — `::decision`, `::data`, etc.
Check budget before reading	P4: Pre-computed token costs	No — must read to know size	Yes — `contentTokens` in schema
Fall back to summaries	P1 + P3: Types + priority	No — no summary layer	Yes — `::summary` blocks
Type-aware synthesis	P1: Block types	No — undifferentiated prose	Yes — route by type
Place write-back knowledge	P2: Typed references	No — no insertion semantics	Yes — parent-child hierarchy

You cannot retrofit these properties onto Markdown without making it something other than Markdown. The properties are not presentation — they are structure. The format IS the architecture.

Storage Options

The knowledge tree can be stored three ways, depending on scale:

Option	Scale	Advantage	Disadvantage
File-based (Markdown tree)	< 1,000 nodes	Human-readable, Git-friendly, no infrastructure	No ACID transactions, slow keyword search
Hybrid (files + SQLite index)	1,000-10,000 nodes	Best of both — human-readable content, fast queries	Two systems to keep in sync
Database (SQLite / PostgreSQL)	10,000+ nodes	ACID, fast queries, scalable	Not human-readable, requires infrastructure

Recommended path

Start with file-based (Phase 1). It works immediately with existing project structures. Migrate to hybrid when keyword search gets slow. Use database only for multi-user, team-shared knowledge stores.

Provider Agnosticism

The algorithm is universal. The LLM provider is swappable.

Component	Universal?	Provider-Specific?
Knowledge tree schema	Yes	No
HBDS algorithm logic	Yes	No
Budget tracking	Yes	No
Token counting	Partially	Tokenizer varies per model
LLM calls (ORIENT, SCORE, SYNTHESIZE)	Prompt format is universal	API call format, auth
Context window size	No	Yes (200K Claude, 128K GPT-4, 1M Gemini)

HBDS auto-adapts to context window size:

TEXT

Claude (200K):  beamWidth=3, searchFraction=0.15
GPT-4 (128K):   beamWidth=3, searchFraction=0.15
Gemini (1M):    beamWidth=4, searchFraction=0.10
Small models (<32K): summary-only mode (never reads full leaves)

Agent Composability

HBDS agents compose into chains. The output of one agent feeds the input of the next — but through summaries and references, not raw content.

TEXT

[Planner] ──→ [Backend Dev] ──→ [Tester]
                    ↓                ↑
              [Frontend Dev] ────────┘

Step 0 (Planner):     Plan the work, allocate budgets     100K tokens
Step 1 (Backend):     Build auth Lambda, JWT validation   250K tokens  ┐ parallel
Step 2 (Frontend):    Build login UI, session management  250K tokens  ┘
Step 3 (Tester):      Write tests for both                200K tokens

Agent 1's full output is NOT passed to Agent 2. Instead, the orchestrator passes a compressed summary + artifact references. Agent 2 can use HBDS to read the detailed artifacts if it needs them. This prevents context window bloat in chains.

Implementation Phases

Phase	Scope	What's Included	What's Not
1: MVP	Single user, file-based, Claude-only	Knowledge tree + ORIENT + DESCEND + SYNTHESIZE	Write-back, chaining, multi-provider
2: Full	Single user, hybrid storage, multi-provider	All 4 phases, agent lifecycle, trust scores, Claude + OpenAI	Multi-user, parallel execution
3: Teams	Multi-user, database-backed	Concurrent writes, parallel agents, namespaces, admin tools	Federation, marketplace
4: Platform	Ecosystem	Public trees, agent marketplace, federation, analytics	—

Summary

HBDS is a search algorithm designed for a world where:

Context windows are finite and always will be
Knowledge worth querying is effectively infinite
Flat text provides no structure for algorithms to navigate
Typed documents declare the structure that algorithms need

The algorithm navigates typed knowledge hierarchies under strict token budgets, uses beam search with pre-read budget verification, degrades gracefully to summaries, and self-improves through write-back. It is impossible on flat text and trivial on typed documents.

The format is the architecture. The algorithm is the proof.