The Autonomous Author — Agent Design

Pipeline Overview

The full agent graph —
every node, every edge.

The pipeline is a directed acyclic graph (P1) or a directed graph with one conditional branch (P2 — the Ambiguity Detector fires after Draft). Every edge is typed: the output schema of the upstream agent is the exact input schema of the downstream agent. No implicit data passing. No shared mutable state between agents.

Diagram 13 Full agent pipeline graph — P1 path (straight), P2 branch (dashed red through Ambiguity Detector), output contracts, shared state model.

Agent A-01

Intake Agent —
from signal to structured brief.

The Intake Agent receives Maya's raw input — a Jira ticket, PR description, feature brief, or free text — and produces a structured DocBrief that every downstream agent consumes. It is the only agent that touches raw, unstructured writer input. Its output contract is the foundation of the entire session.

01

Intake Agent

Input: raw_text · Output: DocBrief · Mode: P1 + P2 · Groq Llama 3.1 70B

P1 + P2

Specification

Input

raw_text (string) — any format Maya pastes

Output

DocBrief

Model

Llama 3.1 70B via Groq

Temp

0.1 — near-deterministic extraction

Max tokens

512 (DocBrief is compact)

Confidence floor

0.6 — below this, flags for Research Agent clarification

Typical latency

~1.2s at 300 tok/s

State Machine

IDLE

Awaiting raw input from UI

EXTRACTING

LLM call — extracting doc_type, persona, audience, feature_name

VALIDATING

Schema check — all required DocBrief fields present

LOW_CONF_PAUSE

Confidence < 0.6 — flags to Research Agent, does not block pipeline

COMPLETE

DocBrief written to PipelineState · XAI card emitted

ERROR

Malformed JSON response — retry once, then surface error to UI

System Prompt Behaviour

Extraction targets

doc_type (feature_doc | ddd_spec | api_ref | changelog), audience, persona selection, existing doc flag (delta mode), workflow_mode (agile | waterfall)

Output format

Strict JSON. If extraction fails for a field, output null + flag in XAI card uncertainty list. Never fabricate field values.

Persona detection

If raw input contains imperative language ("the system shall", "actors:", "preconditions:") → auto-suggest P2 mode. Writer confirms.

Confidence scoring

Self-reported by LLM on 0.0–1.0 scale in output JSON. Calibrated against schema completeness and input signal quality.

Diagram 14 A-01 Intake Agent state machine — IDLE → EXTRACTING → VALIDATING → COMPLETE. Low-confidence and error paths shown.

Agent A-02

Research Agent —
context before drafting.

The Research Agent receives the DocBrief and builds the ContextPack the Draft Agent needs. Its most important capability is gap detection — identifying unknown proper nouns, missing architectural context, and unresolved ambiguities in the brief before a single sentence of the document is written. Bad context in = bad draft out. The Research Agent is the quality gate on context.

02

Research Agent

Input: DocBrief · Output: ContextPack · Mode: P1 + P2 · Gap detection enabled

P1 + P2

Specification

Input

DocBrief

Output

ContextPack

Temp

0.2 — structured extraction + gap identification

Max tokens

1024 — ContextPack can be verbose

Gap threshold

Any proper noun not resolvable from input → context_gap flag

Clarification limit

Max 3 questions surfaced to Maya. Never blocks pipeline — proceeds with gaps flagged.

State Machine

IDLE

Awaiting DocBrief from A-01

ANALYSING_BRIEF

Extract known context from DocBrief fields

GAP_DETECTION

Identify unknown proper nouns and missing context

CLARIFICATION_SURFACE

≤3 questions rendered in UI. Maya answers optionally. Non-blocking.

PACKING

Assemble ContextPack with all available context + gap flags

COMPLETE

ContextPack written · XAI card emitted

Gap Detection Logic

Unknown proper nouns

Any capitalised term not present in DocBrief context, not a known API standard (REST, OAuth, etc.), not a common tech term → flagged as context_gap

P2-specific gaps

Missing: system boundary, actors list, preconditions, success criteria → each becomes a [REQUIRES INPUT:] placeholder in Draft Agent

Non-blocking design

Gaps never stop the pipeline. Draft Agent receives ContextPack with gap_flags[] and handles each with a placeholder. Maya resolves gaps in review.

XAI card content

Lists every gap detected, every assumption made, every clarification question surfaced. Confidence reflects gap count — more gaps = lower confidence.

Agent A-03

Draft Agent —
two modes, one agent.

The Draft Agent produces the document. It operates in two fundamentally different modes — P1 produces a feature doc structure optimised for developer readers, P2 produces an imperative-voice DDD spec optimised for engineering build teams. The mode is set in DocBrief and determines which system prompt variant fires. The agent never infers which mode to use — it is always explicitly set.

03

Draft Agent

Input: ContextPack · Output: DraftDocument · Mode-switching · P1 ≠ P2 system prompts

P1 + P2 (distinct prompts)

P1 Mode — Feature Doc

Structure produced

Overview · Prerequisites · Procedure (numbered steps) · Code sample scaffold · Parameters table · Related links · Changelog entry

Voice

Second person ("you"), present tense, active voice — Google Style Guide defaults enforced in prompt

Code samples

Scaffolded with language tag and placeholder values. Never fabricated API responses. Uses [EXAMPLE_VALUE] for unknowns.

Temp

0.3 — structured but slightly generative for prose quality

Max tokens

2048 — full feature doc

P2 Mode — DDD Spec

Structure produced

Purpose statement · Scope · Definitions · Actors · Preconditions · Functional requirements ("The system shall…") · Non-functional requirements · Error states · Open questions · Requirements traceability table

Voice

Imperative, third person system. "The system shall…" format for every functional requirement. No passive voice.

Placeholder rule

MUST insert [REQUIRES INPUT: reason] for every field where context is absent. Never infer. Never guess. AR-08 · P-10

Temp

0.1 — near-deterministic for spec precision

Max tokens

3072 — DDD specs are longer

State Machine

IDLE

MODE_SELECT

Read persona from DocBrief → load P1 or P2 system prompt

DRAFTING

LLM call with mode-specific prompt + ContextPack

PLACEHOLDER_SCAN

P2 only — verify all gap_flags became [REQUIRES INPUT:] entries

COMPLETE

DraftDocument written · XAI card emitted · Routes to A-04 (P2) or A-05 (P1)

PLACEHOLDER_FAIL

P2: gap_flag without placeholder found → retry with explicit instruction

Diagram 15 Draft Agent output structure — P1 feature doc vs P2 DDD spec. Distinct section sets, voice, temperature, and token budget per mode.

Agent A-04 — P2 Only

Ambiguity Detector —
the DDD quality gate.

The Ambiguity Detector receives the DraftDocument as an external artefact and critiques it — it does not produce content. It fires only in P2 mode, after the Draft Agent, before Compliance. Its single job is to find language that will cause a developer to make a judgment call. Every judgment call a developer makes from a spec is a potential sprint defect.

04

Ambiguity Detector

Input: DraftDocument · Output: AmbiguityReport · Mode: P2 ONLY · Critique-mode prompt

P2 — DDD Only

Detection Categories

Vague quantifiers

"fast", "quickly", "slow", "large", "small", "many", "few", "appropriate", "reasonable", "sufficient", "minimal" — any quantifier without a measurable definition

Undefined terms

Any term used in a requirement that has no definition in the Definitions section. Proper nouns, system names, role names used without prior introduction.

Missing error states

Any functional requirement that describes a success path without a corresponding failure path. "The system shall authenticate the user" without "If authentication fails, the system shall…"

Implicit assumptions

Statements that assume a condition is always true: "when the user is logged in", "assuming the service is available", "given valid input" — without specifying what happens when the assumption is false.

Output Schema

Flag structure

{ term, category, location, severity, suggestion }

Severity levels

HIGH — will cause build defect. MED — will cause clarification round. LOW — stylistic, does not block build.

Location

Section name + sentence excerpt — Maya can find it without re-reading the full spec.

Suggestion

Each flag includes a one-line suggested rewrite. Maya accepts, edits, or ignores in the Review UI.

Temp

0.1 — critique tasks benefit from near-deterministic output

Prompt stance

System prompt explicitly states: "You are a critic, not an author. Do not improve the document. Find every place where a developer would need to make a judgment call."

State Machine

IDLE

P2 mode confirmed — awaiting DraftDocument

SCANNING_QUANTIFIERS

First pass — vague quantifier pattern match + LLM flag

SCANNING_DEFINITIONS

Cross-reference terms against Definitions section

SCANNING_ERROR_STATES

Check each functional req for paired failure path

SCORING

Assign severity · Compile AmbiguityReport

COMPLETE

AmbiguityReport written · XAI card emitted · → A-05

Agent A-05

Compliance Agent —
80 rules, zero exemptions.

The Compliance Agent checks the DraftDocument against the 80-rule Google Developer Style Guide JSON. Every violation is cited against a named rule ID. No rule can be skipped. No violation can be suppressed. The compliance check runs on every document in every session — P1 and P2, Agile and Waterfall. It is a structural property of the pipeline, not a configurable option.

05

Compliance Agent

Input: DraftDocument · Output: ComplianceReport · rules.json v-semver · P1 + P2

P1 + P2 · Mandatory

rules.json Structure

Format

{ id, category, rule, check_type, examples }

Categories (10)

Voice · Tense · Person · Headings · Lists · Code · Links · Terminology · Punctuation · Accessibility

check_type

PATTERN (regex/keyword), STRUCTURAL (section presence), SEMANTIC (LLM-evaluated). 60% pattern, 20% structural, 20% semantic.

Sample rules

047: Avoid Latin abbreviations (e.g., i.e., etc.) · 012: Use second person ("you") · 033: Use present tense · 061: Heading capitalisation: sentence case only · 074: Avoid "simple" and "easy"

Check Execution

PATTERN rules

Regex executed client-side before LLM call. Fast. Deterministic. No tokens consumed. Results appended to LLM prompt context.

STRUCTURAL rules

Section presence checks run against DraftDocument.content_md parsed headings. P1: requires Prerequisites, Steps, Code sections. P2: requires Definitions, Functional Requirements sections.

SEMANTIC rules

LLM call with rules-as-context. Checks nuanced violations: passive voice in complex sentences, vague feature descriptions, overpromising language. Returns violation[] with rule_id and excerpt.

Output guarantee

Every violation in ComplianceReport has: rule_id (citable), excerpt (findable), fix_suggestion (actionable). No violation without all three fields.

State Machine

IDLE

LOAD_RULES

Fetch rules.json · validate schema · log version

PATTERN_CHECK

Client-side regex · no LLM · fast

STRUCTURAL_CHECK

Section presence · heading structure

SEMANTIC_CHECK

LLM call with semantic rules as context

COMPLETE

ComplianceReport compiled · XAI card emitted · → A-06

Agent A-06

Review Prep Agent —
writer-ready in one view.

The Review Prep Agent assembles the ReviewBundle — the final object presented to Maya in the Review UI. It does not generate new content. It organises, prioritises, and formats the pipeline outputs into a coherent review experience. It is the last agent that fires before the human gate. Its XAI card is the summary of the entire pipeline session.

06

Review Prep Agent

Input: DraftDocument + ComplianceReport + AmbiguityReport? · Output: ReviewBundle · Assembly only

P1 + P2 · Assembly

ReviewBundle Contents

Draft panel

Full Markdown draft with inline compliance violation annotations. Violations are highlighted in the text at the exact excerpt location.

Compliance panel

Sorted violation list: HIGH severity first. Each item: rule_id, excerpt, fix_suggestion, Accept / Edit / Ignore actions.

Ambiguity panel (P2)

Sorted flag list: HIGH severity first. Each item: term, category, location, suggestion. Separate from compliance — Maya handles each independently.

XAI summary panel

All 5–6 XAI reasoning cards in chronological order. Maya can read the full pipeline reasoning trail without re-running the session.

Export gate

export_ready flag set to FALSE until Maya has actioned every HIGH-severity item (accepted, edited, or explicitly ignored with a reason). AR-02 · P-02.

Prioritisation Logic

Severity sort

HIGH items surface first in both compliance and ambiguity panels. LOW items are collapsible — hidden by default to reduce cognitive load.

Confidence triage

Any agent output with confidence < 0.7 is flagged with a yellow indicator in the XAI panel. Maya sees which sections of the draft were produced with lower confidence.

Placeholder triage (P2)

Every [REQUIRES INPUT:] placeholder is surfaced as a required action item. Export is blocked until all placeholders are resolved or explicitly deferred.

State Machine

IDLE

ASSEMBLING

Merge draft + reports + XAI cards into ReviewBundle

PRIORITISING

Sort violations + flags by severity · compute export_ready

ANNOTATING

Inline violation markers on draft text

COMPLETE

ReviewBundle ready · XAI card emitted · Human Gate activated

XAI Layer

The Reasoning Card —
XAI made tangible.

Every agent emits one XAI reasoning card before passing control to the next stage. The card is not a log entry — it is a first-class UI element visible in the pipeline monitor in real time. Maya reads the card to understand what the agent decided and why, before the next agent fires. The pipeline monitor is the XAI interface.

Diagram 16 XAI Reasoning Card anatomy — A-03 Draft Agent example. Four fields: understood, decided, why, uncertainties. Confidence score in header.

Full Sequence Diagram

End-to-end interaction —
from paste to export.

Diagram 17 Full end-to-end sequence diagram — Maya through all 6 agents, human gate, export. P2 Ambiguity Detector branch shown dashed.

Rebuttals & Pushbacks

Four agent design challenges.
Every objection answered.

These are the real challenges a principal engineer or technical lead would raise against these specific agent design decisions. Each rebuttal documents the challenge, the tempting shortcut, why the shortcut was rejected, and what was traded away.

Agent Design — Pushback 01

Why does Research Agent surface clarification questions instead of blocking until answered?

The Challenge

"If the Research Agent detects a context gap, it should stop the pipeline and wait for Maya to fill it. A draft written with a known gap is worse than no draft at all."

The Temptation

Blocking pipeline. Research Agent asks questions. Maya must answer all before pipeline proceeds. Feels rigorous.

Why We Rejected It

C-03 (no workflow disruption) is a binding constraint. A blocking clarification step transforms a 15-minute pipeline into a multi-session, asynchronous conversation that requires Maya to context-switch back. The value proposition collapses. Instead: gaps become [REQUIRES INPUT:] placeholders in the draft and HIGH-severity items in the Review UI. Maya resolves them in one focused review session — not scattered across multiple pipeline invocations. Non-blocking + explicit flagging is better UX than blocking + blocking.

Trade-off Accepted

Drafts produced with unresolved gaps contain placeholder text rather than real content. Maya must fill these in Review. This is the correct behaviour — a placeholder is honest about what it doesn't know. An inferred value is dishonest about what it does.

Agent Design — Pushback 02

Why is the confidence score self-reported by the LLM rather than computed externally?

The Challenge

"LLM-reported confidence scores are notoriously unreliable. A model that's hallucinating will still report high confidence. This is false XAI."

The Temptation

Compute confidence externally from schema completeness (% fields populated), input length, gap count, and placeholder count. Fully deterministic. No LLM involvement.

Why We Rejected It

External heuristic confidence scores are more reliable in a narrow sense but less informative in a useful sense. A document with all fields populated can still be wrong if the LLM made a plausible but incorrect inference. The LLM's self-reported confidence, combined with explicit uncertainty enumeration in the XAI card, gives Maya something actionable: not just "confidence = 0.82" but "I'm uncertain about these two specific things." The uncertainties[] field in the XAI card is where the real value is — confidence score is the headline, not the story. Maya reads both. We accept the theoretical unreliability of self-reported confidence because the uncertainties field provides the granular corrective.

Trade-off Accepted

Confidence scores may not be perfectly calibrated. This is disclosed in the Glossary. The Review UI labels them "agent-estimated confidence" to set expectations. The mandatory human gate means an incorrectly confident agent output is still reviewed before publishing.

Agent Design — Pushback 03

The Compliance Agent runs 80 rules but the Google Style Guide has hundreds. Isn't 80 rules inadequate?

The Challenge

"80 curated rules is a cherry-picked subset. A writer who passes the compliance check might still violate dozens of guide rules you didn't encode. You're creating false confidence in compliance."

The Temptation

Encode all 200+ verifiable rules from the full guide. More comprehensive, harder to criticise for omission.

Why We Rejected It

The 80 rules were selected by impact, not convenience. They cover every rule that is (a) binary-checkable without deep semantic understanding, and (b) responsible for >90% of the style violations found in a sample audit of 50 real developer docs. The remaining guide content is either stylistic judgment (where a rule cannot be binary) or highly context-dependent (where an LLM check would produce too many false positives to be trusted). A compliance check that produces false positives destroys writer trust faster than one that has a defined, disclosed scope. The UI discloses the 80-rule scope explicitly — Maya knows she is getting the high-impact subset, not the full guide.

Trade-off Accepted

~20% of guide violations will not be caught by the automated check. The UI documents this. Maya is expected to exercise professional judgment on stylistic and context-dependent rules. The tool does not replace her editorial judgment — it automates the verifiable rules so she can focus on the non-automatable ones.

Agent Design — Pushback 04

Why does Review Prep block export on HIGH items rather than just warning?

The Challenge

"Blocking export is paternalistic. Maya is a professional. She should be able to export with known violations if she chooses. A warning is enough."

The Temptation

Warning-only. Export enabled always. HIGH violations displayed prominently but not enforced. Maya decides.

Why We Rejected It

P-02 (human gate is non-bypassable) is a binding principle, not a UX preference. The architecture's value proposition is that documents produced by the pipeline are compliant before they leave it. If export is available before HIGH items are actioned, the pipeline's compliance guarantee evaporates. A writer under sprint pressure will export and intend to fix it later — and then they won't. The gate is not paternalistic; it is the architectural integrity of the compliance pipeline. "Ignore with reason" is the escape hatch: Maya can dismiss a HIGH violation with a one-sentence reason, which is logged in the session record. That is not blocking; it is requiring a conscious decision rather than an accidental omission.

Trade-off Accepted

Maya must interact with every HIGH item before exporting. For a document with many HIGH violations, this extends the review time. This is the correct trade-off: review time is a feature, not a bug. The pipeline saves time on drafting; the writer invests that time in review. The net is still a significant improvement over the As-Is state.

Six agents.One deterministicpipeline.

The full agent graph —every node, every edge.

Intake Agent —from signal to structured brief.

Research Agent —context before drafting.

Draft Agent —two modes, one agent.

Ambiguity Detector —the DDD quality gate.

Compliance Agent —80 rules, zero exemptions.

Review Prep Agent —writer-ready in one view.

The Reasoning Card —XAI made tangible.

End-to-end interaction —from paste to export.

Four agent design challenges.Every objection answered.

Six agents.
One deterministic
pipeline.

The full agent graph —
every node, every edge.

Intake Agent —
from signal to structured brief.

Research Agent —
context before drafting.

Draft Agent —
two modes, one agent.

Ambiguity Detector —
the DDD quality gate.

Compliance Agent —
80 rules, zero exemptions.

Review Prep Agent —
writer-ready in one view.

The Reasoning Card —
XAI made tangible.

End-to-end interaction —
from paste to export.

Four agent design challenges.
Every objection answered.