HBDS: How AI Agents Search Typed Documents

Summary

HBDS (Hierarchical Budget-Constrained Descent Search) is a search algorithm that navigates typed knowledge hierarchies under strict token budgets. It works in four phases: ORIENT (classify the query), DESCEND (beam search through the tree), SYNTHESIZE (combine results), and WRITE BACK (persist new knowledge). HBDS requires typed documents — it cannot operate on flat text. This page explains the algorithm, walks through a concrete example, and shows why document format determines search capability.

What is HBDS for?

Every AI agent has a context window — a fixed number of tokens it can work with. The knowledge it needs to do its job is usually much larger than that window. HBDS is the algorithm that decides what to read within that budget. It navigates a tree of typed documents, scoring relevance at each level, checking costs before reading, and falling back to summaries when the budget is tight.


The Problem HBDS Solves

An AI agent working on your codebase has access to hundreds of documents — architecture docs, API specs, decision records, meeting notes, guides. Its context window might be 200,000 tokens. The total documentation might be 2,000,000 tokens.

The agent cannot read everything. It must choose. Current approaches:

ApproachHow It WorksThe Problem
RAGChunk docs, embed, retrieve top-k similar chunksFlat — no hierarchy, arbitrary chunk boundaries, no budget control
Agentic tool useLLM decides what to grep/read based on reasoningNo formal budget tracking, no guaranteed coverage
Graph RAGExtract entities from flat text, build knowledge graph, traverseStructure is inferred post-hoc — inference errors propagate
RAPTORCluster chunks, summarize into tree, search top-downTree is auto-generated from flat text, no typed blocks

All of these either operate on flat structure or infer structure from flat text at retrieval time. None use structure declared in the document format itself.

HBDS does. That's the difference.


The Four Phases

HBDS works in four sequential phases. Each phase consumes tokens from a monotonically decreasing budget. The context window IS the execution budget.

TEXT
ORIENT ──→ DESCEND ──→ SYNTHESIZE ──→ WRITE BACK
 200-800    500-5000    1000-3000      200-1000 tokens

Phase 1: ORIENT

Goal: Classify the query and select where to start searching.

The agent reads a lightweight root index — a table of contents for the entire knowledge tree. One LLM call classifies the query and picks the most relevant top-level domains.

TEXT
Query: "How does authentication work in this app?"

Root Index:
  ├── project/          (backend, frontend, infra — 450K tokens)
  ├── general/          (TypeScript, AWS, patterns — 800K tokens)
  └── agent-memories/   (session logs — 200K tokens)

Classification: project_local
Selected domain: project/backend/
Confidence: 0.92
Cost: 600 tokens
Fast path

If the root index contains a keyword match with >0.9 confidence, HBDS skips the entire descent and reads the matching leaf directly. This handles queries like "What is the Stripe price ID?" in ~1 second.

Phase 2: DESCEND

Goal: Navigate the tree to find relevant leaf nodes, within budget.

This is the core of the algorithm — an iterative beam search with budget awareness. At each level of the tree:

    TEXT
    Query: "How does authentication work?"
    Budget remaining: 28,000 tokens
    
    Level 0 — project/ children:
      backend/     (auth, billing, data)     score: 0.95  ✓ beam
      frontend/    (components, state)       score: 0.35  ✗ pruned
      infra/       (amplify, deploy)         score: 0.20  ✗ pruned
    
    Level 1 — backend/ children:
      auth/        (4,200 tokens)            score: 0.98  ✓ beam
      billing/     (3,100 tokens)            score: 0.12  ✗ pruned
      data/        (5,800 tokens)            score: 0.30  ✗ pruned
    
    Level 2 — auth/ is a leaf node
      Budget check: 4,200 ≤ 25,400 ✓
      → Read full content
      → Budget remaining: 21,200
    
    Result: auth/ content (relevance: 0.98, 4,200 tokens)
    Pre-read budget verification

    This is the step that flat text makes impossible. Before HBDS reads a leaf node, it checks can_afford(node) — comparing the node's declared token count against the remaining budget.

    In flat text, there is no declared token count. The agent must read the document to find out how large it is. Every read is a blind commitment. In a typed document with P4 (pre-computed token costs), the budget check is a single integer comparison.

    Graceful Degradation

    When the budget is insufficient for full content, HBDS doesn't skip the node entirely. It falls back to the node's ::summary block — a compact, high-signal representation that costs a fraction of the full content.

    TEXT
    Node: infrastructure/deployment-guide/
    Full content: 12,000 tokens
    Summary: 180 tokens
    Budget remaining: 3,500 tokens
    
    Decision: 12,000 > 3,500 → cannot afford full content
    Fallback: read summary (180 tokens)
    Result stored as PARTIAL (relevance score discounted 50%)

    This graceful degradation is only possible because typed documents distinguish ::summary from full content. In flat text, there is no summary layer.

    Phase 3: SYNTHESIZE

    Goal: Combine all retrieved knowledge into a coherent context injection.

    The synthesis phase takes every result from DESCEND — full content and partial summaries — sorts them by relevance, and packs them into the remaining budget. A single LLM call produces:

    OutputPurpose
    AnswerCoherent synthesis of retrieved knowledge
    Confidence0-1 rating of how well the sources answer the query
    GapsAspects of the query the sources don't cover
    New insightsDeductions not explicitly stated in any source
    Sources usedNode IDs that contributed (for citation)

    Partial results (summaries) are included but marked as [PARTIAL], so the synthesis knows to weight them lower and flag any conclusions that depend solely on partial data.

    Phase 4: WRITE BACK

    Goal: Persist new knowledge generated during the search.

    The synthesis phase often produces new insights — deductions that aren't explicitly stated in any source but emerge from combining multiple sources. HBDS writes these back to the knowledge tree.

    TEXT
    New insight: "The auth system depends on a post-confirmation Lambda
    trigger that creates the user's initial workspace. This dependency is
    not documented in either the auth or workspace docs."
    
    Placement: project/backend/auth/ (append)
    Trust score: 0.6 (agent-generated, lower than human-authored 0.9)
    Source: "agent-write-back"

    Over time, the knowledge tree self-improves through use. Every query that produces new insight writes it back, with trust scoring and version tracking. The tree reorganizes around actual query patterns.

    Self-organizing knowledge

    This is what flat text structurally cannot do. A flat text corpus cannot self-organize because there is no structure to organize into. Write-back requires:

    • P1 (block types) to know what kind of knowledge to write
    • P2 (typed refs) to know where in the hierarchy to place it
    • P4 (token costs) to update after the new content is added

    The Budget System

    The entire context window is divided into zones. Each zone has a guaranteed minimum and a flexible maximum.

    TEXT
    Total: 200,000 tokens
    ├── System prompt         [FIXED]       3,000
    ├── Task description      [FIXED]       2,000
    ├── Search budget         [FLEXIBLE]    29,000  (15%)
    │   ├── ORIENT reserve                  1,000
    │   ├── DESCEND reserve                 25,000
    │   └── SYNTHESIZE reserve              3,000
    ├── Task work             [FLEXIBLE]    155,000 (80%)
    ├── Write-back            [FLEXIBLE]    5,800   (3%)
    └── Output reserve        [FIXED]       4,000

    If search completes early (direct hit, shallow tree), unused search budget transfers to the task work zone. Search never steals from the task zone — if the search budget is exhausted, the agent works with whatever it found.

    RUST
    struct BudgetTracker {
        total: usize,
        consumed: usize,
        reserved: usize,
    }
    
    impl BudgetTracker {
        fn available(&self) -> usize {
            self.total - self.consumed - self.reserved
        }
    
        fn can_afford(&self, node: &TreeNode) -> bool {
            node.content_tokens <= self.available()
        }
    
        fn can_afford_subtree(&self, node: &TreeNode) -> bool {
            node.subtree_token_estimate <= self.available()
        }
    
        fn consume(&mut self, n: usize, label: &str) {
            assert!(n <= self.available(), "Budget exceeded");
            self.consumed += n;
        }
    }

    Budget Reporting

    Every HBDS search produces a full accounting report:

    JSON
    {
      "totalBudget": 29000,
      "consumed": {
        "orient": { "readIndex": 180, "classify": 420, "total": 600 },
        "descend": {
          "depth0": { "score": 380, "readLeaves": 0, "expand": 150, "total": 530 },
          "depth1": { "score": 350, "readLeaves": 4200, "expand": 0, "total": 4550 },
          "total": 5080
        },
        "synthesize": { "total": 1800 },
        "writeback": { "total": 450 },
        "searchTotal": 7930
      },
      "unused": 21070,
      "searchEfficiency": {
        "nodesScored": 8,
        "nodesRead": 1,
        "tokensPerUsefulResult": 7930,
        "searchFractionActual": 0.04
      }
    }

    In this example, the search used only 4% of the total budget to find the right knowledge. The remaining 96% is available for the actual task. Compare this to loading the entire documentation (2M tokens — doesn't even fit in the context window).


    The Knowledge Tree Schema

    Every node in the HBDS knowledge tree has this structure:

    JSON
    {
      "id": "proj.backend.auth",
      "path": "/project/backend/auth",
      "parentId": "proj.backend",
      "type": "leaf",
      "summary": "Authentication setup using OPAQUE-KE with post-confirmation triggers.",
      "keywords": ["auth", "opaque", "login", "jwt"],
      "relevanceHints": [
        "How does authentication work?",
        "How do users sign up?"
      ],
      "contentTokens": 4200,
      "metadata": {
        "created": "2026-02-08T10:00:00Z",
        "lastUpdated": "2026-02-22T14:30:00Z",
        "version": 3,
        "trustScore": 0.92,
        "accessCount": 14,
        "source": "manual"
      }
    }

    Key design decisions:

    FieldWhy It ExistsHBDS Phase That Uses It
    `summary`Cheap proxy for full content; enables graceful degradationDESCEND (fallback), SYNTHESIZE
    `keywords`Fast direct-hit matching without LLM scoringORIENT (fast path)
    `relevanceHints`Example queries this node answers; improves LLM scoringDESCEND (scoring prompt)
    `contentTokens`Pre-read budget verificationDESCEND (can_afford check)
    `trustScore`Quality signal — decays over time, boosted by verificationDESCEND (ranking), SYNTHESIZE
    `accessCount`Popularity signal for tree optimizationWRITE BACK (reorganization)

    Trust Scores

    Trust is not static. It decays over time and is boosted by verification events.

    TEXT
    Initial trust:
      Human-authored:  0.9
      Agent-generated: 0.6
    
    Decay:
      trustScore(t) = trustScore(last) * 0.997 ^ days_since_update
      Half-life: ~231 days
    
    Boosts:
      Agent verification (reads and confirms): +0.05
      Human verification:                      +0.15
      Cross-referenced by another node:        +0.03
    
    Floor: 0.10 (below this → flagged for review or archived)

    During search, nodes with trustScore < 0.3 receive a scoring penalty: effectiveScore = rawScore * trustScore. Nodes past their TTL are excluded entirely.


    Configuration

    HBDS is tunable. Every parameter has a sensible default, but the orchestrator can adjust them per-task.

    ParameterDefaultWhat It Controls
    `beamWidth`3Branches explored per descent level
    `maxDepth`5Maximum tree depth
    `pruneThreshold`0.2Minimum score to keep a candidate
    `depthDecayFactor`0.85Score penalty per descent level
    `searchFraction`0.15Fraction of total budget for search
    `minimumUsefulBudget`2000Stop search below this token count
    `enableWriteBack`trueAllow knowledge tree updates
    `enableDirectHit`trueAllow fast-path keyword matching

    For research-heavy tasks (architecture analysis, dependency tracing), increase searchFraction to 0.40. For execution-heavy tasks (simple code edits where context is in the prompt), reduce to 0.05.


    Why Flat Text Cannot Support HBDS

    This is the core claim. Each HBDS phase depends on a property of the document format. Flat text lacks all four.

    HBDS CapabilityRequired Format PropertyAvailable in Markdown?Available in SurfDoc?
    Classify query against domain indexP2: Typed cross-references (hierarchy)No — headings are visualYes — frontmatter + `::related`
    Score candidate relevanceP1: Declared block typesNo — blocks are untypedYes — `::decision`, `::data`, etc.
    Check budget before readingP4: Pre-computed token costsNo — must read to know sizeYes — `contentTokens` in schema
    Fall back to summariesP1 + P3: Types + priorityNo — no summary layerYes — `::summary` blocks
    Type-aware synthesisP1: Block typesNo — undifferentiated proseYes — route by type
    Place write-back knowledgeP2: Typed referencesNo — no insertion semanticsYes — parent-child hierarchy

    You cannot retrofit these properties onto Markdown without making it something other than Markdown. The properties are not presentation — they are structure. The format IS the architecture.


    Storage Options

    The knowledge tree can be stored three ways, depending on scale:

    OptionScaleAdvantageDisadvantage
    File-based (Markdown tree)< 1,000 nodesHuman-readable, Git-friendly, no infrastructureNo ACID transactions, slow keyword search
    Hybrid (files + SQLite index)1,000-10,000 nodesBest of both — human-readable content, fast queriesTwo systems to keep in sync
    Database (SQLite / PostgreSQL)10,000+ nodesACID, fast queries, scalableNot human-readable, requires infrastructure
    Recommended path

    Start with file-based (Phase 1). It works immediately with existing project structures. Migrate to hybrid when keyword search gets slow. Use database only for multi-user, team-shared knowledge stores.


    Provider Agnosticism

    The algorithm is universal. The LLM provider is swappable.

    ComponentUniversal?Provider-Specific?
    Knowledge tree schemaYesNo
    HBDS algorithm logicYesNo
    Budget trackingYesNo
    Token countingPartiallyTokenizer varies per model
    LLM calls (ORIENT, SCORE, SYNTHESIZE)Prompt format is universalAPI call format, auth
    Context window sizeNoYes (200K Claude, 128K GPT-4, 1M Gemini)

    HBDS auto-adapts to context window size:

    TEXT
    Claude (200K):  beamWidth=3, searchFraction=0.15
    GPT-4 (128K):   beamWidth=3, searchFraction=0.15
    Gemini (1M):    beamWidth=4, searchFraction=0.10
    Small models (<32K): summary-only mode (never reads full leaves)

    Agent Composability

    HBDS agents compose into chains. The output of one agent feeds the input of the next — but through summaries and references, not raw content.

    TEXT
    [Planner] ──→ [Backend Dev] ──→ [Tester]
                        ↓                ↑
                  [Frontend Dev] ────────┘
    
    Step 0 (Planner):     Plan the work, allocate budgets     100K tokens
    Step 1 (Backend):     Build auth Lambda, JWT validation   250K tokens  ┐ parallel
    Step 2 (Frontend):    Build login UI, session management  250K tokens  ┘
    Step 3 (Tester):      Write tests for both                200K tokens

    Agent 1's full output is NOT passed to Agent 2. Instead, the orchestrator passes a compressed summary + artifact references. Agent 2 can use HBDS to read the detailed artifacts if it needs them. This prevents context window bloat in chains.


    Implementation Phases

    PhaseScopeWhat's IncludedWhat's Not
    1: MVPSingle user, file-based, Claude-onlyKnowledge tree + ORIENT + DESCEND + SYNTHESIZEWrite-back, chaining, multi-provider
    2: FullSingle user, hybrid storage, multi-providerAll 4 phases, agent lifecycle, trust scores, Claude + OpenAIMulti-user, parallel execution
    3: TeamsMulti-user, database-backedConcurrent writes, parallel agents, namespaces, admin toolsFederation, marketplace
    4: PlatformEcosystemPublic trees, agent marketplace, federation, analytics

    Summary

    HBDS is a search algorithm designed for a world where:

    • Context windows are finite and always will be
    • Knowledge worth querying is effectively infinite
    • Flat text provides no structure for algorithms to navigate
    • Typed documents declare the structure that algorithms need

    The algorithm navigates typed knowledge hierarchies under strict token budgets, uses beam search with pre-read budget verification, degrades gracefully to summaries, and self-improves through write-back. It is impossible on flat text and trivial on typed documents.

    The format is the architecture. The algorithm is the proof.