The Autonomous Author / Page 04 — Agent Design

Six agents.
One deterministic
pipeline.

Each agent has a single responsibility, a defined input contract, a typed output schema, a formal state machine, and an XAI reasoning card. No agent decides the pipeline sequence. No agent publishes without a human gate. Every agent is independently testable, replaceable, and auditable.

6 Agents · LangGraph Pattern Single Responsibility Groq · Llama 3.1 70B XAI Card per Agent Ambiguity Detector — P2 Only Deterministic · No Autonomous Loops
Pipeline Overview

The full agent graph —
every node, every edge.

The pipeline is a directed acyclic graph (P1) or a directed graph with one conditional branch (P2 — the Ambiguity Detector fires after Draft). Every edge is typed: the output schema of the upstream agent is the exact input schema of the downstream agent. No implicit data passing. No shared mutable state between agents.

START A-01 Intake Agent → DocBrief XAI card ↑ DocBrief A-02 Research Agent → ContextPack XAI card ↑ ContextPack A-03 Draft Agent → DraftDoc · P1/P2 XAI card ↑ P1 direct P2 branch A-04 Ambiguity Det. → AmbiguityReport XAI card ↑ AmbigReport+Draft A-05 Compliance Agent → ComplianceReport XAI card ↑ Report A-06 Review Prep → ReviewBundle XAI card ↑ ⬡ HUMAN GATE Maya reviews & approves Non-bypassable · AR-02 EXPORT MD · HTML · Plain text OUTPUT CONTRACT TYPES — TypeScript-style schemas DocBrief: {persona, doc_type, feature_name, audience, existing_doc_id?, workflow_mode, raw_input} ContextPack: {context_text, gaps: string[], clarifications: QA[], confidence: number} DraftDocument: {content_md, doc_type, placeholders: Placeholder[], version, word_count} AmbiguityReport: {vague_terms: Flag[], undefined_terms: Flag[], missing_errors: Flag[], severity: low|med|high[]} ComplianceReport: {violations: Violation[], violation_count, rules_version, checked_at} ReviewBundle: {draft, compliance_report, ambiguity_report?, xai_cards: XAICard[], ready_for_review: true} SHARED STATE: sessionStorage holds active PipelineState object · each agent reads its input from PipelineState and writes its output back · no direct agent-to-agent calls Agents are pure functions: (input_schema) → (output_schema + XAICard). No side effects. No global mutation. Testable in isolation. P-04 · AR-06 Pipeline agent (P1 + P2) P2-only agent Compliance agent Human gate (non-bypassable) XAI CARDS EMITTED ↑ — one per agent, before next agent fires · visible in pipeline monitor in real time
Diagram 13 Full agent pipeline graph — P1 path (straight), P2 branch (dashed red through Ambiguity Detector), output contracts, shared state model.
Agent A-01

Intake Agent —
from signal to structured brief.

The Intake Agent receives Maya's raw input — a Jira ticket, PR description, feature brief, or free text — and produces a structured DocBrief that every downstream agent consumes. It is the only agent that touches raw, unstructured writer input. Its output contract is the foundation of the entire session.

01
Intake Agent
Input: raw_text · Output: DocBrief · Mode: P1 + P2 · Groq Llama 3.1 70B
P1 + P2
Specification
Input
raw_text (string) — any format Maya pastes
Output
DocBrief
Model
Llama 3.1 70B via Groq
Temp
0.1 — near-deterministic extraction
Max tokens
512 (DocBrief is compact)
Confidence floor
0.6 — below this, flags for Research Agent clarification
Typical latency
~1.2s at 300 tok/s
State Machine
IDLE
Awaiting raw input from UI
EXTRACTING
LLM call — extracting doc_type, persona, audience, feature_name
VALIDATING
Schema check — all required DocBrief fields present
LOW_CONF_PAUSE
Confidence < 0.6 — flags to Research Agent, does not block pipeline
COMPLETE
DocBrief written to PipelineState · XAI card emitted
ERROR
Malformed JSON response — retry once, then surface error to UI
System Prompt Behaviour
Extraction targets
doc_type (feature_doc | ddd_spec | api_ref | changelog), audience, persona selection, existing doc flag (delta mode), workflow_mode (agile | waterfall)
Output format
Strict JSON. If extraction fails for a field, output null + flag in XAI card uncertainty list. Never fabricate field values.
Persona detection
If raw input contains imperative language ("the system shall", "actors:", "preconditions:") → auto-suggest P2 mode. Writer confirms.
Confidence scoring
Self-reported by LLM on 0.0–1.0 scale in output JSON. Calibrated against schema completeness and input signal quality.
A-01 INTAKE AGENT — STATE MACHINE IDLE awaiting input input EXTRACTING LLM call · JSON parse parsed VALIDATING schema check valid conf<0.6 LOW_CONF_PAUSE flag → Research Agent flag attached COMPLETE DocBrief + XAI card → A-02 bad JSON ERROR retry once · surface to UI → Research
Diagram 14 A-01 Intake Agent state machine — IDLE → EXTRACTING → VALIDATING → COMPLETE. Low-confidence and error paths shown.
Agent A-02

Research Agent —
context before drafting.

The Research Agent receives the DocBrief and builds the ContextPack the Draft Agent needs. Its most important capability is gap detection — identifying unknown proper nouns, missing architectural context, and unresolved ambiguities in the brief before a single sentence of the document is written. Bad context in = bad draft out. The Research Agent is the quality gate on context.

02
Research Agent
Input: DocBrief · Output: ContextPack · Mode: P1 + P2 · Gap detection enabled
P1 + P2
Specification
Input
DocBrief
Output
ContextPack
Temp
0.2 — structured extraction + gap identification
Max tokens
1024 — ContextPack can be verbose
Gap threshold
Any proper noun not resolvable from input → context_gap flag
Clarification limit
Max 3 questions surfaced to Maya. Never blocks pipeline — proceeds with gaps flagged.
State Machine
IDLE
Awaiting DocBrief from A-01
ANALYSING_BRIEF
Extract known context from DocBrief fields
GAP_DETECTION
Identify unknown proper nouns and missing context
CLARIFICATION_SURFACE
≤3 questions rendered in UI. Maya answers optionally. Non-blocking.
PACKING
Assemble ContextPack with all available context + gap flags
COMPLETE
ContextPack written · XAI card emitted
Gap Detection Logic
Unknown proper nouns
Any capitalised term not present in DocBrief context, not a known API standard (REST, OAuth, etc.), not a common tech term → flagged as context_gap
P2-specific gaps
Missing: system boundary, actors list, preconditions, success criteria → each becomes a [REQUIRES INPUT:] placeholder in Draft Agent
Non-blocking design
Gaps never stop the pipeline. Draft Agent receives ContextPack with gap_flags[] and handles each with a placeholder. Maya resolves gaps in review.
XAI card content
Lists every gap detected, every assumption made, every clarification question surfaced. Confidence reflects gap count — more gaps = lower confidence.
Agent A-03

Draft Agent —
two modes, one agent.

The Draft Agent produces the document. It operates in two fundamentally different modes — P1 produces a feature doc structure optimised for developer readers, P2 produces an imperative-voice DDD spec optimised for engineering build teams. The mode is set in DocBrief and determines which system prompt variant fires. The agent never infers which mode to use — it is always explicitly set.

03
Draft Agent
Input: ContextPack · Output: DraftDocument · Mode-switching · P1 ≠ P2 system prompts
P1 + P2 (distinct prompts)
P1 Mode — Feature Doc
Structure produced
Overview · Prerequisites · Procedure (numbered steps) · Code sample scaffold · Parameters table · Related links · Changelog entry
Voice
Second person ("you"), present tense, active voice — Google Style Guide defaults enforced in prompt
Code samples
Scaffolded with language tag and placeholder values. Never fabricated API responses. Uses [EXAMPLE_VALUE] for unknowns.
Temp
0.3 — structured but slightly generative for prose quality
Max tokens
2048 — full feature doc
P2 Mode — DDD Spec
Structure produced
Purpose statement · Scope · Definitions · Actors · Preconditions · Functional requirements ("The system shall…") · Non-functional requirements · Error states · Open questions · Requirements traceability table
Voice
Imperative, third person system. "The system shall…" format for every functional requirement. No passive voice.
Placeholder rule
MUST insert [REQUIRES INPUT: reason] for every field where context is absent. Never infer. Never guess. AR-08 · P-10
Temp
0.1 — near-deterministic for spec precision
Max tokens
3072 — DDD specs are longer
State Machine
IDLE
MODE_SELECT
Read persona from DocBrief → load P1 or P2 system prompt
DRAFTING
LLM call with mode-specific prompt + ContextPack
PLACEHOLDER_SCAN
P2 only — verify all gap_flags became [REQUIRES INPUT:] entries
COMPLETE
DraftDocument written · XAI card emitted · Routes to A-04 (P2) or A-05 (P1)
PLACEHOLDER_FAIL
P2: gap_flag without placeholder found → retry with explicit instruction
DRAFT AGENT — OUTPUT STRUCTURE COMPARISON · P1 vs P2 P1 — FEATURE RELEASE DOC ## Overview What this feature does, why it exists ## Prerequisites Auth, SDK version, permissions ## Steps (procedure) Numbered, imperative steps ## Code example Language-tagged scaffold ## Parameters Name · Type · Required · Description ## Error codes Code · Cause · Resolution ## Related links Cross-references · Changelog entry P2 — DDD SPEC ## Purpose statement Why this spec exists ## Scope + Definitions Boundary · Glossary of terms used ## Actors + Preconditions Who acts · What must be true before ## Functional requirements "The system shall…" — every requirement ## Non-functional requirements Performance · Security · Constraints ## Error states + Open questions Every failure mode · [REQUIRES INPUT:] ## Requirements traceability table REQ-ID · Description · Status Temp: 0.3 · Max: 2048 tok · Voice: second person, active, present Temp: 0.1 · Max: 3072 tok · Voice: third person system, imperative
Diagram 15 Draft Agent output structure — P1 feature doc vs P2 DDD spec. Distinct section sets, voice, temperature, and token budget per mode.
Agent A-04 — P2 Only

Ambiguity Detector —
the DDD quality gate.

The Ambiguity Detector receives the DraftDocument as an external artefact and critiques it — it does not produce content. It fires only in P2 mode, after the Draft Agent, before Compliance. Its single job is to find language that will cause a developer to make a judgment call. Every judgment call a developer makes from a spec is a potential sprint defect.

04
Ambiguity Detector
Input: DraftDocument · Output: AmbiguityReport · Mode: P2 ONLY · Critique-mode prompt
P2 — DDD Only
Detection Categories
Vague quantifiers
"fast", "quickly", "slow", "large", "small", "many", "few", "appropriate", "reasonable", "sufficient", "minimal" — any quantifier without a measurable definition
Undefined terms
Any term used in a requirement that has no definition in the Definitions section. Proper nouns, system names, role names used without prior introduction.
Missing error states
Any functional requirement that describes a success path without a corresponding failure path. "The system shall authenticate the user" without "If authentication fails, the system shall…"
Implicit assumptions
Statements that assume a condition is always true: "when the user is logged in", "assuming the service is available", "given valid input" — without specifying what happens when the assumption is false.
Output Schema
Flag structure
{ term, category, location, severity, suggestion }
Severity levels
HIGH — will cause build defect. MED — will cause clarification round. LOW — stylistic, does not block build.
Location
Section name + sentence excerpt — Maya can find it without re-reading the full spec.
Suggestion
Each flag includes a one-line suggested rewrite. Maya accepts, edits, or ignores in the Review UI.
Temp
0.1 — critique tasks benefit from near-deterministic output
Prompt stance
System prompt explicitly states: "You are a critic, not an author. Do not improve the document. Find every place where a developer would need to make a judgment call."
State Machine
IDLE
P2 mode confirmed — awaiting DraftDocument
SCANNING_QUANTIFIERS
First pass — vague quantifier pattern match + LLM flag
SCANNING_DEFINITIONS
Cross-reference terms against Definitions section
SCANNING_ERROR_STATES
Check each functional req for paired failure path
SCORING
Assign severity · Compile AmbiguityReport
COMPLETE
AmbiguityReport written · XAI card emitted · → A-05
Agent A-05

Compliance Agent —
80 rules, zero exemptions.

The Compliance Agent checks the DraftDocument against the 80-rule Google Developer Style Guide JSON. Every violation is cited against a named rule ID. No rule can be skipped. No violation can be suppressed. The compliance check runs on every document in every session — P1 and P2, Agile and Waterfall. It is a structural property of the pipeline, not a configurable option.

05
Compliance Agent
Input: DraftDocument · Output: ComplianceReport · rules.json v-semver · P1 + P2
P1 + P2 · Mandatory
rules.json Structure
Format
{ id, category, rule, check_type, examples }
Categories (10)
Voice · Tense · Person · Headings · Lists · Code · Links · Terminology · Punctuation · Accessibility
check_type
PATTERN (regex/keyword), STRUCTURAL (section presence), SEMANTIC (LLM-evaluated). 60% pattern, 20% structural, 20% semantic.
Sample rules
047: Avoid Latin abbreviations (e.g., i.e., etc.) · 012: Use second person ("you") · 033: Use present tense · 061: Heading capitalisation: sentence case only · 074: Avoid "simple" and "easy"
Check Execution
PATTERN rules
Regex executed client-side before LLM call. Fast. Deterministic. No tokens consumed. Results appended to LLM prompt context.
STRUCTURAL rules
Section presence checks run against DraftDocument.content_md parsed headings. P1: requires Prerequisites, Steps, Code sections. P2: requires Definitions, Functional Requirements sections.
SEMANTIC rules
LLM call with rules-as-context. Checks nuanced violations: passive voice in complex sentences, vague feature descriptions, overpromising language. Returns violation[] with rule_id and excerpt.
Output guarantee
Every violation in ComplianceReport has: rule_id (citable), excerpt (findable), fix_suggestion (actionable). No violation without all three fields.
State Machine
IDLE
LOAD_RULES
Fetch rules.json · validate schema · log version
PATTERN_CHECK
Client-side regex · no LLM · fast
STRUCTURAL_CHECK
Section presence · heading structure
SEMANTIC_CHECK
LLM call with semantic rules as context
COMPLETE
ComplianceReport compiled · XAI card emitted · → A-06
Agent A-06

Review Prep Agent —
writer-ready in one view.

The Review Prep Agent assembles the ReviewBundle — the final object presented to Maya in the Review UI. It does not generate new content. It organises, prioritises, and formats the pipeline outputs into a coherent review experience. It is the last agent that fires before the human gate. Its XAI card is the summary of the entire pipeline session.

06
Review Prep Agent
Input: DraftDocument + ComplianceReport + AmbiguityReport? · Output: ReviewBundle · Assembly only
P1 + P2 · Assembly
ReviewBundle Contents
Draft panel
Full Markdown draft with inline compliance violation annotations. Violations are highlighted in the text at the exact excerpt location.
Compliance panel
Sorted violation list: HIGH severity first. Each item: rule_id, excerpt, fix_suggestion, Accept / Edit / Ignore actions.
Ambiguity panel (P2)
Sorted flag list: HIGH severity first. Each item: term, category, location, suggestion. Separate from compliance — Maya handles each independently.
XAI summary panel
All 5–6 XAI reasoning cards in chronological order. Maya can read the full pipeline reasoning trail without re-running the session.
Export gate
export_ready flag set to FALSE until Maya has actioned every HIGH-severity item (accepted, edited, or explicitly ignored with a reason). AR-02 · P-02.
Prioritisation Logic
Severity sort
HIGH items surface first in both compliance and ambiguity panels. LOW items are collapsible — hidden by default to reduce cognitive load.
Confidence triage
Any agent output with confidence < 0.7 is flagged with a yellow indicator in the XAI panel. Maya sees which sections of the draft were produced with lower confidence.
Placeholder triage (P2)
Every [REQUIRES INPUT:] placeholder is surfaced as a required action item. Export is blocked until all placeholders are resolved or explicitly deferred.
State Machine
IDLE
ASSEMBLING
Merge draft + reports + XAI cards into ReviewBundle
PRIORITISING
Sort violations + flags by severity · compute export_ready
ANNOTATING
Inline violation markers on draft text
COMPLETE
ReviewBundle ready · XAI card emitted · Human Gate activated
XAI Layer

The Reasoning Card —
XAI made tangible.

Every agent emits one XAI reasoning card before passing control to the next stage. The card is not a log entry — it is a first-class UI element visible in the pipeline monitor in real time. Maya reads the card to understand what the agent decided and why, before the next agent fires. The pipeline monitor is the XAI interface.

A-03 Draft Agent — Reasoning Card conf: 0.82 WHAT I UNDERSTOOD P1 mode · Feature doc for PATCH /users/{id} endpoint · Audience: API integrators · No existing doc to update Context pack contained: endpoint description, 2 parameters, auth requirement, 1 context gap (rate limit policy) WHAT I DECIDED Produced full feature doc with 6 sections: Overview, Prerequisites, Steps (4), Parameters table, Error codes, Related links Used second person, present tense, active voice throughout · Code sample scaffolded with placeholder values WHY P1 system prompt selects this structure for API endpoint feature docs · Voice defaults enforced by system prompt to satisfy Google Style Guide §1.2 (second person), §3.4 (present tense), §2.1 (active voice) before Compliance Agent checks UNCERTAINTIES — REVIEW THESE ⚠ Rate limit policy for this endpoint unknown — inserted [REQUIRES INPUT: rate limit per minute] in Error codes section ⚠ Auth scope requirement inferred from similar endpoints — verify "users:write" scope is correct for PATCH Confidence: 0.82 · 2 uncertainties · Draft complete · Compliance Agent firing next Agent ID + name visible in monitor header understood field what input was interpreted decided field what output was produced why field rule citations + reasoning uncertainties field actionable items for Maya confidence score 0.0–1.0 · guides review
Diagram 16 XAI Reasoning Card anatomy — A-03 Draft Agent example. Four fields: understood, decided, why, uncertainties. Confidence score in header.
Full Sequence Diagram

End-to-end interaction —
from paste to export.

Maya A-01 Intake A-02 Research A-03 Draft A-04 Ambig (P2 only) A-05 Compliance A-06 Review Prep paste raw input + confirm P1/P2 XAI card A-01 + DocBrief preview DocBrief ≤3 clarification questions (non-blocking) answers (optional) XAI card A-02 + ContextPack summary ContextPack XAI card A-03 + draft preview P2: DraftDoc → Ambiguity XAI card A-04 + AmbiguityReport preview DraftDoc (+ AmbigReport P2) XAI card A-05 + ComplianceReport summary All outputs → assemble ReviewBundle + XAI card A-06 → Review UI activated ⬡ HUMAN GATE — Maya's exclusive domain Reviews compliance violations · Acts on ambiguity flags (P2) · Reads XAI cards · Edits draft · Approves sections Export disabled until all HIGH items actioned · No agent fires during this phase · AR-02 · P-02 Maya: approve + export EXPORT — Markdown / HTML / Plain text · Session written to IndexedDB P1: ≤15 min total · P2: ≤20 min total (including Ambiguity Detector) Intake ~1m Research ~2m Draft ~4m Amb ~2m P2 Compl ~2m Prep ~1m Maya review: ~5m (P1) / ~8m (P2) Timeline: P1 total ≤15 min · P2 total ≤20 min (Ambiguity Detector adds ~2 min agent + ~3 min Maya review)
Diagram 17 Full end-to-end sequence diagram — Maya through all 6 agents, human gate, export. P2 Ambiguity Detector branch shown dashed.
Rebuttals & Pushbacks

Four agent design challenges.
Every objection answered.

These are the real challenges a principal engineer or technical lead would raise against these specific agent design decisions. Each rebuttal documents the challenge, the tempting shortcut, why the shortcut was rejected, and what was traded away.

Agent Design — Pushback 01
Why does Research Agent surface clarification questions instead of blocking until answered?
The Challenge

"If the Research Agent detects a context gap, it should stop the pipeline and wait for Maya to fill it. A draft written with a known gap is worse than no draft at all."

The Temptation

Blocking pipeline. Research Agent asks questions. Maya must answer all before pipeline proceeds. Feels rigorous.

Why We Rejected It

C-03 (no workflow disruption) is a binding constraint. A blocking clarification step transforms a 15-minute pipeline into a multi-session, asynchronous conversation that requires Maya to context-switch back. The value proposition collapses. Instead: gaps become [REQUIRES INPUT:] placeholders in the draft and HIGH-severity items in the Review UI. Maya resolves them in one focused review session — not scattered across multiple pipeline invocations. Non-blocking + explicit flagging is better UX than blocking + blocking.

Trade-off Accepted

Drafts produced with unresolved gaps contain placeholder text rather than real content. Maya must fill these in Review. This is the correct behaviour — a placeholder is honest about what it doesn't know. An inferred value is dishonest about what it does.

Agent Design — Pushback 02
Why is the confidence score self-reported by the LLM rather than computed externally?
The Challenge

"LLM-reported confidence scores are notoriously unreliable. A model that's hallucinating will still report high confidence. This is false XAI."

The Temptation

Compute confidence externally from schema completeness (% fields populated), input length, gap count, and placeholder count. Fully deterministic. No LLM involvement.

Why We Rejected It

External heuristic confidence scores are more reliable in a narrow sense but less informative in a useful sense. A document with all fields populated can still be wrong if the LLM made a plausible but incorrect inference. The LLM's self-reported confidence, combined with explicit uncertainty enumeration in the XAI card, gives Maya something actionable: not just "confidence = 0.82" but "I'm uncertain about these two specific things." The uncertainties[] field in the XAI card is where the real value is — confidence score is the headline, not the story. Maya reads both. We accept the theoretical unreliability of self-reported confidence because the uncertainties field provides the granular corrective.

Trade-off Accepted

Confidence scores may not be perfectly calibrated. This is disclosed in the Glossary. The Review UI labels them "agent-estimated confidence" to set expectations. The mandatory human gate means an incorrectly confident agent output is still reviewed before publishing.

Agent Design — Pushback 03
The Compliance Agent runs 80 rules but the Google Style Guide has hundreds. Isn't 80 rules inadequate?
The Challenge

"80 curated rules is a cherry-picked subset. A writer who passes the compliance check might still violate dozens of guide rules you didn't encode. You're creating false confidence in compliance."

The Temptation

Encode all 200+ verifiable rules from the full guide. More comprehensive, harder to criticise for omission.

Why We Rejected It

The 80 rules were selected by impact, not convenience. They cover every rule that is (a) binary-checkable without deep semantic understanding, and (b) responsible for >90% of the style violations found in a sample audit of 50 real developer docs. The remaining guide content is either stylistic judgment (where a rule cannot be binary) or highly context-dependent (where an LLM check would produce too many false positives to be trusted). A compliance check that produces false positives destroys writer trust faster than one that has a defined, disclosed scope. The UI discloses the 80-rule scope explicitly — Maya knows she is getting the high-impact subset, not the full guide.

Trade-off Accepted

~20% of guide violations will not be caught by the automated check. The UI documents this. Maya is expected to exercise professional judgment on stylistic and context-dependent rules. The tool does not replace her editorial judgment — it automates the verifiable rules so she can focus on the non-automatable ones.

Agent Design — Pushback 04
Why does Review Prep block export on HIGH items rather than just warning?
The Challenge

"Blocking export is paternalistic. Maya is a professional. She should be able to export with known violations if she chooses. A warning is enough."

The Temptation

Warning-only. Export enabled always. HIGH violations displayed prominently but not enforced. Maya decides.

Why We Rejected It

P-02 (human gate is non-bypassable) is a binding principle, not a UX preference. The architecture's value proposition is that documents produced by the pipeline are compliant before they leave it. If export is available before HIGH items are actioned, the pipeline's compliance guarantee evaporates. A writer under sprint pressure will export and intend to fix it later — and then they won't. The gate is not paternalistic; it is the architectural integrity of the compliance pipeline. "Ignore with reason" is the escape hatch: Maya can dismiss a HIGH violation with a one-sentence reason, which is logged in the session record. That is not blocking; it is requiring a conscious decision rather than an accidental omission.

Trade-off Accepted

Maya must interact with every HIGH item before exporting. For a document with many HIGH violations, this extends the review time. This is the correct trade-off: review time is a feature, not a bug. The pipeline saves time on drafting; the writer invests that time in review. The net is still a significant improvement over the As-Is state.