The Autonomous Enterprise / Page 05

Agent Swarm
Architecture
— formally specified.

The HITL specification from Page 04 is the input. Every checkpoint defined there is implemented here as a formal state machine node — with an entry condition, a presentation contract, a timeout, and an immutable audit record. This is not a diagram of agents. It is an architecture of agents.

Google ADK A2A Protocol MCP Tool Manifest 5 Specialist Agents 11 HITL State Nodes 3-Tier Memory
Swarm Topology

One orchestrator. Five specialists.
One shared context.

The AE agent swarm uses Google ADK for agent definition, A2A protocol for inter-agent communication, and MCP for tool access. The Orchestrator is the single point of task dispatch — it never executes business logic directly. Specialist agents are stateless and idempotent. All state lives in Firestore. All tool calls are audited.

Agent Swarm — Full Topology
Orchestrator · 5 specialist agents · Tool layer · External systems · A2A communication paths
PRESENTATION LAYER · HITL APPROVAL UI · HUMAN REVIEWERS ORCHESTRATION LAYER · GOOGLE ADK SPECIALIST AGENTS · CLOUD RUN · STATELESS · IDEMPOTENT MCP TOOL LAYER · GCP SERVICES EXTERNAL SYSTEMS AE Orchestrator ADK · A2A · MCP dispatcher SA: orchestrator-sa@ · Cloud Run HITL events CCAI Sales Agent ADK · Cloud Run SA: ccai-sa@ ContractGuard Agent ADK · Cloud Run SA: cg-sa@ RevRec AI Agent ADK · Cloud Run SA: revrec-sa@ Asset IQ Agent ADK · Cloud Run SA: assetiq-sa@ FinRisk Sentinel ADK · Cloud Run SA: finrisk-sa@ A2A task dispatch Vertex AI Gemini 1.5 Pro · Pipelines Firestore State · HITL · Audit BigQuery Audit · Features · ML Pub/Sub Event bus · Cross-agent Document AI Contract parsing · GCS Secret Manager OAuth tokens · API keys Salesforce Dev Edition · REST API SAP S/4HANA Mock BigQuery (demo) 6 Regional Asset Systems Pub/Sub ingestion pipeline Document Store GCS upload → Document AI Every agent action · Every tool call · Every state transition → immutable audit record in Firestore + BigQuery before next state is entered
Orchestrator (ADK)
Specialist agents
MCP tool layer (GCP)
External systems
A2A task dispatch
MCP tool call
Orchestrator Agent

The dispatcher — never the executor.

The Orchestrator is the only agent that receives external requests. It never executes business logic directly. It decomposes tasks, routes sub-tasks to specialist agents via A2A, tracks task completion across the swarm, handles agent failures via circuit breaker, and maintains the global conversation context. It is the only agent with write access to the Orchestrator state collection in Firestore.

State Machine
IDLE — awaiting task dispatch
→ on: inbound A2A task message
DECOMPOSING — breaking task into sub-tasks
→ on: decomposition complete
DISPATCHING — sending sub-tasks to agents via A2A
→ on: all dispatches acknowledged
AWAITING — monitoring specialist agent completions
→ on: all sub-tasks complete / on: HITL pause received
HITL PAUSE — waiting for human approval on one or more sub-tasks
→ on: HITL approved / on: HITL rejected → ROLLBACK
CIRCUIT OPEN — specialist agent failed · fallback active
→ on: retry threshold exceeded
COMPLETE — all sub-tasks resolved · audit record committed
Tool Manifest
a2a.dispatch_taskSend task to specialist agent · returns task_id + ack
a2a.get_task_statusPoll specialist agent for task state · returns current FSM state
a2a.cancel_taskCancel in-flight task · triggers specialist rollback state
firestore.write_orchestration_statePersist orchestrator FSM state · atomic write with task context
firestore.write_audit_recordImmutable audit log · action_id · timestamp · agent_id · state
pubsub.publish_eventBroadcast cross-agent event · topic: orchestrator-events
hitl.create_checkpointCreate HITL state node · links to HITL-spec checkpoint ID
hitl.await_decisionBlock orchestrator until HITL resolution · timeout: per spec
Circuit Breaker Configuration
Failure threshold
3 failures
within 60s window
Open state duration
30 seconds
before half-open probe
Fallback behaviour
HITL escalation
route to human immediately
Audit action
Always
circuit events → Firestore
Agent Specifications

Five agents. Every state. Every tool. Every boundary.

Each specialist agent is defined by three things: its state machine (what states it can be in and what triggers each transition), its tool manifest (the exact MCP tools it is permitted to call), and its autonomy boundary (the line between what it does autonomously and what it escalates to a human). These are not descriptions — they are specifications.

Agent 01
CCAI Sales Agent
Multi-turn conversational agent handling inbound MRI inquiries. Manages qualification, configuration, and CPQ initiation autonomously through 11 turns before escalating to a human AE.
State Machine
IDLE
→ on: inbound inquiry event (Pub/Sub)
QUALIFYING — turn 1–4: budget, authority, need, timeline
→ on: qualification complete
CONFIGURING — turn 5–8: clinical requirements, MRI model fit, BOM
→ on: BOM validated
PRICING — turn 9–11: pricing estimate, delivery timeline, proposal draft
→ on: turn 11 reached OR commercial terms entered
HITL-01 — generating briefing doc · awaiting AE engagement
→ on: AE confirms engagement
HANDED OFF — Salesforce Opportunity created · audit record committed
→ on: agent failure at any state
CIRCUIT OPEN — escalate to VP Sales · preserve conversation state
Tool Manifest (MCP)
gemini.generate_responseMulti-turn conversation · system prompt: sales qualification playbook
salesforce.get_accountLookup hospital account by domain · returns account ID + history
salesforce.create_opportunityCreate Opportunity on escalation · stage: Qualification · auto-populates from conversation context
salesforce.create_activityLog every conversation turn as Activity on the Opportunity
product_catalogue.get_skuRetrieve MRI model specs, configurations, and pricing tiers
bom.validate_configurationRun BOM validation against applications engineering rules
document.generate_briefGenerate structured briefing document from conversation context
firestore.write_conversation_statePersist turn-by-turn conversation state · enables resume after HITL
hitl.create_checkpointHITL-01 · present briefing doc to AE · await engagement confirmation
Autonomy Boundary & Thresholds
Qualification confidence threshold≥ 0.75 → auto
BOM validation required before pricingAlways
Escalation trigger (turn count)Turn 11
Commercial terms detectedImmediate HITL
Circuit breaker threshold3 failures / 60s
Conversation state TTL (Firestore)7 days
Qualification questions (turns 1–4)
Clinical configuration matching against product catalogue
BOM validation and pricing estimate generation
Salesforce Opportunity creation and Activity logging
Briefing document generation
Any discussion of commercial terms, discounts, or deal structure
Escalation to human AE — HITL-01 checkpoint
Custom clinical configuration outside standard catalogue
Agent 02
ContractGuard Agent
Document-native contract intelligence agent. Reads full contracts via Gemini 1.5 Pro 1M token context, performs clause-level analysis, risk scores non-standard terms, and routes to Legal HITL before any counter-proposal is drafted.
State Machine
IDLE
→ on: contract uploaded to GCS (Pub/Sub trigger)
INGESTING — Document AI parsing · GCS → structured clause list
→ on: parse complete · clause count > 0
ANALYSING — Gemini 1.5 Pro full-document reasoning · clause classification
→ on: analysis complete
SCORING — risk model inference · SHAP attribution per flagged clause
→ on: risk scores above threshold detected
HITL-02 / HITL-03 — Legal review queue · awaiting approval per flagged clause
→ on: all HITL decisions received (approve / revise / escalate)
DRAFTING — generating counter-proposal based on HITL decisions
→ on: draft complete
COMPLETE — contract analysis + HITL record + counter-proposal committed
→ on: Document AI parse failure OR Gemini timeout
CIRCUIT OPEN — fallback to manual Legal review · preserve document state
Tool Manifest (MCP)
gcs.read_documentRead contract from GCS bucket · CMEK-encrypted · returns raw bytes
document_ai.parse_contractExtract structured clause list with positions, types, and metadata
gemini.analyse_contractFull 1M-token context pass · system prompt: clause classification + risk taxonomy
risk_model.score_clauseXGBoost clause risk model · returns risk score + SHAP attribution
vector_store.find_precedentsFind 3 most similar clauses from historical contract corpus · returns similarity scores
gemini.generate_counterDraft counter-position for flagged clause based on HITL decision and ClaraVis standard terms
salesforce.update_contractWrite analysis results back to Salesforce Contract object · risk summary field
hitl.create_checkpointHITL-02 (risk clause) / HITL-03 (governing law) · present clause + SHAP + precedents
firestore.write_clause_analysisPersist per-clause analysis, risk scores, SHAP values, and HITL decisions
Autonomy Boundary & Thresholds
Clause risk threshold → HITL≥ 0.65
Governing law non-standardAlways HITL-03
Liability cap ratio threshold> 3× contract value
Gemini confidence (analysis)≥ 0.80 → auto
Max contract size (tokens)900K tokens
Circuit breaker threshold3 failures / 60s
Document AI parsing and clause extraction
Standard clause classification (200+ types)
Precedent search and similarity scoring
Risk scoring below HITL threshold
Any clause with risk score ≥ 0.65 — HITL-02
All non-standard governing law clauses — HITL-03
Counter-proposal generation (requires approved HITL record first)
Contracts above 900K tokens — manual Legal review
Agent 03
RevRec AI Agent
ASC 606 / IFRS 15 revenue recognition classification agent. Classifies every MRI transaction as sale, lease, or multi-element arrangement. Every classification routes through Finance Controller HITL before posting to SAP. No exceptions.
State Machine
IDLE
→ on: contract signed event (Pub/Sub · Salesforce)
EXTRACTING — pulling contract line items, terms, and pricing from Salesforce
→ on: features extracted and validated
CLASSIFYING — ML model inference · ASC 606 rule engine · SHAP computation
→ on: classification complete · confidence ≥ minimum threshold
HITL-04 — Finance Controller review queue · classification + SHAP + comparables
→ on: HITL approved
POSTING — writing classification to Transaction entity · initiating SAP write
→ on: SAP write confirmed
COMPLETE — Transaction entity tagged · SAP posted · audit record committed
→ on: confidence < minimum threshold
HITL-09 — low confidence · manual classification requested
→ on: multi-element detected
HITL-05 — performance obligation split review
Tool Manifest (MCP)
salesforce.get_contract_detailsRetrieve contract terms, line items, pricing, and customer type
feature_store.get_featuresRetrieve pre-computed transaction features from Vertex AI Feature Store
asc606_model.classifyRun trained classification model · returns class + confidence + raw feature vector
shap.explainCompute SHAP values for classification · returns top-5 feature attributions
bigquery.find_comparablesFind 3 most similar historical transactions by feature similarity
hitl.create_checkpointHITL-04 (standard) / HITL-05 (multi-element) / HITL-09 (low confidence)
bigquery.write_transactionWrite Transaction entity with recognition type + performance obligation tags
sap.post_journal_entryInitiate SAP GL posting · requires HITL approval record ID as mandatory parameter
Autonomy Boundary & Thresholds
Minimum classification confidence≥ 0.70 req'd
HITL required for all classificationsAlways
SAP write without HITL recordBlocked by design
Multi-element threshold> 1 perf. obligation
SHAP generationEvery inference
Circuit breaker threshold2 failures / 60s
Feature extraction from Salesforce contract
ASC 606 model inference and SHAP computation
Comparable transaction lookup and presentation
Every classification without exception — HITL-04
SAP GL posting — only after HITL approval record committed
Multi-element splits — HITL-05
Low-confidence classifications — HITL-09 manual review
Agent 04
Asset IQ Agent
Predictive maintenance intelligence agent. Processes unified asset telemetry from 12,000+ MRI units. Runs two-tier ML: fleet-level RUL prediction and unit-level anomaly detection. Routes below-threshold predictions to Field Service HITL.
State Machine
IDLE
→ on: scheduled cadence trigger (daily) OR asset event (Pub/Sub)
INGESTING — reading asset events from unified Pub/Sub pipeline
→ on: event batch assembled
FEATURE ENGINEERING — computing time-series features per unit
→ on: features computed and stored in Feature Store
RUL PREDICTION — fleet-level model inference · SHAP per unit
→ on: predictions complete
ANOMALY DETECTION — unit-level anomaly scan · cross-regional pattern detection
→ on: predictions above confidence → auto work order / below confidence → HITL
HITL-06 — low confidence prediction · Field Service Manager review
→ on: fleet anomaly pattern detected (cross-regional)
HITL-07 — fleet anomaly alert · VP Field Service + FSM review
→ on: all HITL decisions received
COMPLETE — work orders created · Device entities updated · ISO 13485 DHR written
Tool Manifest (MCP)
pubsub.subscribe_asset_eventsPull from unified asset telemetry topic · batch by device_id + time window
feature_store.write_featuresStore time-series features per device for RUL model and drift monitoring
rul_model.predictRun RUL gradient boosting model · returns days_to_failure + confidence + SHAP
anomaly_model.detectIsolation Forest unit-level anomaly · returns anomaly score + contributing sensors
shap.explain_sensorsSHAP for sensor time-series features · top-3 sensor attribution per alert
bigquery.query_fleet_patternsCross-regional pattern query · find units with similar anomaly profiles
salesforce.create_work_orderCreate preventive maintenance work order on Case object
bigquery.update_deviceWrite RUL score + last prediction timestamp to Device entity
hitl.create_checkpointHITL-06 (low confidence) / HITL-07 (fleet anomaly)
bigquery.write_dhr_eventISO 13485 Device History Record event · maintenance activity log
Autonomy Boundary & Thresholds
RUL prediction confidence → auto work order≥ 0.82
RUL prediction confidence → HITL-06< 0.82
Fleet anomaly (≥ N units)≥ 3 units → HITL-07
Anomaly score threshold≥ 0.75
RUL alert horizon< 14 days
Circuit breaker threshold3 failures / 120s
Feature engineering and Feature Store writes
RUL model inference and SHAP computation
Work orders for high-confidence (≥ 0.82) predictions
Device entity updates (RUL score, last prediction)
ISO 13485 DHR event writes
Low-confidence predictions — HITL-06 (FSM approval)
Fleet-level anomaly patterns — HITL-07 (VP Field Service)
Any action that would trigger a potential recall review
Agent 05
FinRisk Sentinel Agent
Real-time financial anomaly detection agent. Monitors the BigQuery financial event stream continuously. Detects unusual payment patterns, revenue posting discrepancies, and warranty reserve movements. Routes high-severity anomalies to CFO + Finance Controller HITL simultaneously.
State Machine
IDLE
→ on: financial event stream (BigQuery streaming insert)
MONITORING — continuous anomaly scan on incoming financial events
→ on: anomaly score above alert threshold
ENRICHING — computing Z-score vs 90-day baseline · SHAP attribution
→ on: severity classified
ALERTING — medium severity: Finance Controller notification + context package
→ on: high severity detected
HITL-08 — high severity · CFO + Finance Controller simultaneous HITL
→ on: HITL decision received (acknowledge / false positive / escalate)
LEARNING — false positive feedback written to baseline update queue
→ on: feedback processed
RESOLVED — anomaly record committed · decision logged · baseline updated
→ on: BigQuery streaming failure
CIRCUIT OPEN — alert ops team · switch to batch scan fallback
Tool Manifest (MCP)
bigquery.stream_financial_eventsSubscribe to financial event stream · filter by event_type: payment, posting, reserve
anomaly_model.score_eventIsolation Forest + statistical anomaly scoring · returns anomaly score + contributing features
bigquery.compute_zscoreCompute Z-score vs 90-day rolling baseline for the event type and entity
shap.explain_anomalySHAP attribution for anomaly score · top-3 financial features
bigquery.get_entity_contextRetrieve full entity context (account, contract, recent transactions) for the anomaly
notification.send_alertSend structured alert package to Finance Controller (medium) or CFO + FC (high)
hitl.create_checkpointHITL-08 · simultaneous CFO + Finance Controller · 1-hour SLA
bigquery.write_anomaly_recordPersist anomaly event, scores, SHAP, HITL decision, and resolution to audit dataset
pubsub.publish_baseline_updatePublish false positive feedback to baseline model update queue
Autonomy Boundary & Thresholds
Alert threshold (anomaly score)≥ 0.65 → alert
HITL threshold (high severity)≥ 0.85
Z-score alert threshold≥ 3.0σ
HITL SLA (high severity)1 hour
Monitoring cadenceStreaming · sub-5min
Circuit breaker threshold5 failures / 120s
Continuous anomaly scoring on financial event stream
Z-score computation and SHAP attribution
Medium-severity alerts with context package (no HITL required)
False positive feedback processing to baseline queue
High-severity anomalies (≥ 0.85) — HITL-08 simultaneous CFO + FC
Any anomaly indicating potential regulatory reporting obligation
Anomalies in warranty reserve — always HITL regardless of score
A2A Protocol

How agents communicate — precisely.

Agent-to-Agent (A2A) is the communication protocol between the Orchestrator and specialist agents. Every message is typed, versioned, and auditable. The sequence below shows a ContractGuard task dispatch and the HITL escalation that follows. The JSON schema below it is the actual message format.

A2A Sequence — ContractGuard Task: Clause Risk Escalation
Orchestrator dispatches task → ContractGuard analyses → HITL-02 pause → Legal approves → counter-proposal generated
Salesforce CLM Orchestrator ContractGuard Legal (HITL) Firestore contract_signed event (Pub/Sub) write: orchestration_state = DISPATCHING A2A: dispatch_task {task_id, type: CONTRACT_ANALYSIS, payload} A2A: task_ack {task_id, state: INGESTING} Analysis + Scoring HITL-02: {clause_text, risk_score: 0.82, shap, precedents[3]} write: state = HITL_PAUSE · hitl_id created A2A: task_update {state: HITL_PAUSE, hitl_id} ⏸ HITL PAUSE — Legal reviewer evaluating clause · 24-hour SLA · Agent awaiting decision HITL decision: APPROVE {hitl_id, reason_code, ts} write: hitl_event {decision, approver, ts} immutable Draft counter A2A: task_complete {task_id, output_ref} write: orchestration_state = COMPLETE · full audit trail salesforce.update_contract(risk_summary, hitl_ref) t+0 t+1s t+28m t+2.5h t+2.5h
A2A Message Schema — Task Dispatch (dispatch_task)
{
  "a2a_version": "1.0",
  "message_type": "TASK_DISPATCH",          // TASK_DISPATCH | TASK_ACK | TASK_UPDATE | TASK_COMPLETE | TASK_ERROR
  "task_id": "task_cg_20260315_001a",       // globally unique · format: task_{agent}_{date}_{seq}
  "correlation_id": "orch_20260315_042",     // orchestration session ID · links all sub-tasks
  "from_agent": "orchestrator",
  "to_agent": "contractguard",
  "timestamp_utc": "2026-03-15T09:14:32Z",
  "task_type": "CONTRACT_ANALYSIS",
  "priority": "NORMAL",                      // NORMAL | HIGH | CRITICAL
  "timeout_seconds": 3600,                  // 1 hour · circuit breaker triggers at 3 failures
  "payload": {
    "contract_id": "sfdc_contract_CV2026_0042",
    "gcs_uri": "gs://claravis-contracts-eu/2026/0042_uniklinik.pdf",
    "counterparty": "Universitätsklinikum München",
    "contract_value_eur": 2840000,
    "analysis_config": {
      "risk_threshold": 0.65,              // clauses above this score → HITL-02
      "governing_law_check": "true",       // always trigger HITL-03 if non-standard
      "precedent_count": 3,                // number of similar precedents to surface in HITL
      "generate_counter": "post_hitl_approval"
    }
  },
  "audit": {
    "initiated_by": "orchestrator-sa@claravis-ae-prod.iam.gserviceaccount.com",
    "audit_trail_id": "audit_20260315_cg_001a",  // Firestore document ID · immutable
    "parent_hitl_ids": []                    // populated when this task is triggered by a HITL decision
  }
}
A2A Message Schema — HITL Update (task_update → HITL_PAUSE)
{
  "a2a_version": "1.0",
  "message_type": "TASK_UPDATE",
  "task_id": "task_cg_20260315_001a",
  "from_agent": "contractguard",
  "to_agent": "orchestrator",
  "timestamp_utc": "2026-03-15T09:42:18Z",
  "state": "HITL_PAUSE",
  "hitl_context": {
    "hitl_spec_id": "HITL-02",                  // references Page 04 HITL specification
    "hitl_event_id": "hitl_20260315_cg_007",      // Firestore document ID · immutable on creation
    "approver_role": "GENERAL_COUNSEL",
    "sla_deadline_utc": "2026-03-16T09:42:18Z",  // 24-hour SLA per HITL-02 spec
    "timeout_action": "ESCALATE_TO_GC_MANAGER",
    "presented_to_human": {
      "clause_text": "Liability limited to 50% of contract value...",
      "risk_score": 0.82,
      "shap_attribution": [
        { "feature": "liability_cap_ratio", "value": 0.5, "contribution": +0.31 },
        { "feature": "governing_law_match", "value": "false", "contribution": +0.24 },
        { "feature": "indemnification_asymmetry", "value": 0.78, "contribution": +0.18 }
      ],
      "precedent_contracts": [
        { "id": "sfdc_contract_CV2024_0108", "similarity": 0.91, "outcome": "negotiated_up_to_80pct" },
        { "id": "sfdc_contract_CV2025_0033", "similarity": 0.87, "outcome": "accepted_with_carve_out" }
      ],
      "decision_options": ["APPROVE_AS_IS", "REQUEST_REVISION", "ESCALATE_EXTERNAL_COUNSEL"]
    }
  }
}
Memory Architecture

Three-tier memory. Each tier with a purpose.

Agent memory is not a monolith. Short-term memory holds the context for the current task — it is ephemeral and task-scoped. Long-term memory holds the institutional knowledge that makes agents smarter over time — contract precedents, historical decisions, asset failure patterns. The shared context bus is the event stream that keeps all agents aware of what other agents are doing.

TIER 01
Short-Term Memory
Firestore · Task-scoped · TTL: 7 days
The working memory for a single agent task. Stores conversation turns (CCAI Sales), document analysis state (ContractGuard), classification context (RevRec), and sensor batch context (Asset IQ). Every write is atomic and timestamped. State is preserved across HITL pauses — the agent can resume from the exact state it was in when it paused.
Schema (agent_task_state collection):
task_id · agent_id · fsm_state
context_payload (JSON) · created_at
last_updated_at · hitl_ids[]
correlation_id · ttl_expires_at
TIER 02
Long-Term Memory
Vertex AI Vector Store · Persistent · Embedding: text-embedding-004
The institutional knowledge base. ContractGuard uses it for precedent search — finding the 3 most similar clauses from ClaraVis's historical contract corpus. RevRec AI uses it for comparable transaction lookup. Asset IQ uses it for cross-regional failure pattern matching. Every HITL decision is written back to long-term memory as a labelled example — the agents get smarter with every human review.
Collections:
contract_clauses · transaction_history
asset_failure_patterns · hitl_decisions
Embedding model: text-embedding-004
Similarity metric: cosine · top-k: 3
TIER 03
Shared Context Bus
Pub/Sub · Cross-agent · Retention: 7 days
The event stream that keeps all agents aware of what is happening across the swarm. When ContractGuard flags a liability clause, FinRisk Sentinel subscribes to the same event and can adjust its financial anomaly baseline accordingly. When Asset IQ detects a fleet-level failure pattern, RevRec AI can factor that into warranty reserve recognition. The shared bus enables cross-module intelligence without direct agent-to-agent coupling.
Topics:
ae-orchestration-events
ae-hitl-events · ae-asset-events
ae-contract-events · ae-financial-events
Retention: 7 days · at-least-once delivery
Guardrails & Safety

What happens when things go wrong — by design.

A production-grade agent swarm is defined as much by its failure modes as its happy path. Every guardrail below is a design artifact — not a monitoring dashboard added after the fact. The circuit breaker, confidence thresholds, hallucination detection, and fallback behaviours are specified before a line of agent code is written.

Guardrail 01
Circuit Breaker
Every specialist agent is wrapped in a circuit breaker that the Orchestrator monitors. When a specialist agent fails to respond within its timeout, returns an error state, or produces an output that fails schema validation, the Orchestrator opens the circuit for that agent and routes the task to a HITL fallback — a human performs the function the agent was trying to perform. The circuit closes after a configurable half-open probe period.
States: CLOSED → OPEN → HALF-OPEN → CLOSED
OPEN trigger: 3 failures in 60s window (default)
HALF-OPEN probe: single request after 30s
OPEN action: route to HITL · preserve task state
Audit: every circuit state transition → Firestore
Guardrail 02
Hallucination Detection
LLM outputs that inform business decisions are validated against a schema contract before they are acted on. Gemini responses from ContractGuard clause analysis must conform to the ClauseAnalysis JSON schema — responses that fail validation are retried with a temperature reduction (0.7 → 0.3 → 0.1) before escalating to HITL. For RevRec AI, the classification must be one of three valid ASC 606 types — any other output triggers an immediate HITL-09 manual classification request.
Validation: JSON schema contract per agent output type
Retry strategy: temperature reduction: 0.7 → 0.3 → 0.1
Max retries: 3 · then HITL escalation
All retry attempts: logged to Firestore audit record
Invalid outputs: never acted on · always HITL
Guardrail 03
Confidence Thresholds
Every ML model inference and LLM analysis in the swarm produces a confidence score. Scores above the configured threshold allow autonomous action. Scores below the threshold pause the agent and route to HITL — the human gets the agent's best work and decides whether to accept it. Thresholds are configured per agent and per action type, not globally. A low-confidence revenue recognition classification is treated differently from a low-confidence qualification assessment.
CCAI Sales: qualification confidence ≥ 0.75
ContractGuard: Gemini analysis confidence ≥ 0.80
RevRec AI: classification confidence ≥ 0.70 (HITL always regardless)
Asset IQ: RUL confidence ≥ 0.82 for auto work order
FinRisk: anomaly score ≥ 0.85 for high-severity HITL
Guardrail 04
Fallback & Rollback
Every agent task has a defined rollback path — the set of compensating actions that restore the system to its pre-task state if the task fails or is rejected at HITL. Firestore's transactional writes mean that partial state is never committed. The Orchestrator tracks every state transition and can reconstruct the pre-task state from the Firestore audit record for any task that needs to be rolled back. SAP write operations are the only irreversible action — they require a committed HITL approval record as a mandatory input parameter.
Rollback trigger: HITL rejection · circuit open · agent timeout
State preservation: Firestore atomic writes · no partial state
SAP write guard: HITL approval record ID is a required parameter
Rollback audit: rollback action written to Firestore before execution
Salesforce rollback: Opportunity stage reverted · activity log appended
Architecture Decision Records

Three decisions. Every alternative documented.

ADR-007 through ADR-009 are produced in the agent swarm design phase. Each states the choice, the alternatives that were evaluated, and why this choice was made — the reasoning that a principal engineer or enterprise architect will probe in any serious design review.

ADR-007
Google ADK over LangGraph or CrewAI
ADK selected as the agent orchestration framework. LangGraph provides excellent graph-based state machine support but runs on arbitrary Python infrastructure — it has no native GCP observability, IAM integration, or Vertex AI deployment path. CrewAI is high-level and fast to prototype but does not expose the state machine primitives required for formal HITL checkpoint specification. ADK runs natively on Cloud Run with Vertex AI integration, has first-class Firestore state management, and its A2A protocol is an open standard — not a proprietary message format locked to one vendor's SDK.
Accepted · Phase Agent Design
ADR-008
A2A protocol over direct HTTP for inter-agent communication
Direct HTTP calls between agents were considered as the simplest integration path. Rejected because: direct HTTP creates tight coupling between agent endpoints, makes circuit breaking the Orchestrator's responsibility rather than a platform concern, and produces no auditable message record. A2A messages are published to Pub/Sub, giving the event bus replay capability, at-least-once delivery guarantees, and a complete message history that is queryable in BigQuery. Every A2A message is also written to the Firestore audit record — direct HTTP calls are not.
Accepted · Phase Agent Design
ADR-009
Firestore over Redis for agent state and HITL audit
Redis was considered for agent short-term memory given its low latency and widespread use for session state. Rejected for two reasons: (1) Redis is an in-memory store — data loss on failure requires a persistence configuration that adds operational complexity. (2) The HITL audit requirement mandates that HITL event records are immutable and durable by design — Redis TTL-based eviction is architecturally incompatible with an immutable audit store. Firestore's transactional writes, native JSON document model, and eu-west3 regional deployment satisfy both the state management and immutable audit requirements in a single managed service.
Accepted · Phase Agent Design
Next in the Portfolio
Agents specified.
The ML layer follows.

The agent specifications on this page reference ML models by name — asc606_model.classify, rul_model.predict, anomaly_model.score_event. Page 06 designs those models from the ground up: feature engineering, training pipelines, SHAP explanation contracts, Model Cards, MLOps, and drift detection.

PG 06
ML Engineering & MLOps
Feature Store · Model Cards · SHAP · Vertex AI Pipelines · Drift Detection
In Design
PG 04
← Delivery & Product Design
The HITL specification this page implements