Agent Architecture — The Autonomous HR

01 — Agent topology

Five agents.
One orchestrator.

The Intent Router is the entry point for every interaction. It classifies the incoming message and hands off to the appropriate specialist agent. Each specialist owns exactly one domain — no agent crosses its boundary without an explicit A2A handoff.

Orchestrator

Intent Router

Gemini 1.5 Flash · Vertex AI · Stateless

Receives every inbound message. Classifies intent across 6 categories. Extracts entities (employee ID, leave type, dates, language). Routes to the appropriate specialist agent via Pub/Sub event. The only agent that touches every interaction.

Autonomy boundary

Classify and route only. Never decide. Never write state. Never notify. If classification confidence < 0.70 → return clarification prompt to worker.

classify_intent extract_entities detect_language publish_event

Specialist · Phase 1

Leave Agent

LangGraph + Gemini 1.5 Flash · Stateful

Owns the full leave lifecycle: application, balance query, approval, denial, and cancellation. Queries Firestore for employee balance and team headcount. Queries Policy RAG for governing clause. Decides autonomously when confidence ≥ 0.80. Triggers HITL when below threshold.

Autonomy boundary

Approve or deny when: policy confidence ≥ 0.80, balance sufficient, coverage maintained. Escalate to HITL on any failure of these conditions.

get_leave_balance check_team_coverage query_policy_rag write_leave_record write_audit_log trigger_hitl send_notification

Specialist · Phase 1

Policy Q&A Agent

Gemini 1.5 Flash + pgvector RAG · Stateless

Answers any question a worker asks about HR policy — entitlements, notice periods, leave types, payslip timing. Retrieves the governing clause from the Policy RAG store and responds in the worker's detected language. Every answer cites its source clause.

Autonomy boundary

Answer read-only policy questions. Never interpret ambiguous clauses as decisions. If RAG confidence < 0.75 → acknowledge uncertainty, suggest worker contact owner directly.

query_policy_rag translate_response send_notification

Specialist · Phase 2

Onboarding Agent

LangGraph + Gemini 1.5 Flash · Stateful

Guides new joinee through document collection via WhatsApp conversation. Collects Aadhaar, PAN, bank details, emergency contact, and photo. Validates document formats. Triggers ESIC/EPF registration workflow. Delivers digital offer letter. Maintains session state across a multi-turn conversation spanning hours or days.

Autonomy boundary

Collect and validate documents autonomously. Never finalise onboarding without Priya's explicit confirmation. Statutory registration trigger requires owner approval.

create_employee_record validate_document store_document trigger_esic_registration send_offer_letter

Specialist · Phase 4

Grievance Agent

LangGraph + Gemini 1.5 Flash · Stateful

Receives and logs grievances raised by workers via voice or text. Classifies grievance type (wage, conduct, safety, discrimination). Maintains statutory grievance register. Routes to Priya for resolution. Tracks status. Generates labour inspection export on demand.

Autonomy boundary

Log and classify autonomously. Never resolve a grievance without owner action. All wage-related and discrimination grievances require mandatory HITL — no autonomous resolution regardless of confidence.

log_grievance classify_grievance update_grievance_register trigger_hitl generate_statutory_report

Cross-cutting · All phases

HITL Manager

LangGraph + Cloud Functions · Stateful

Manages every human-in-the-loop escalation. Receives escalation events from any specialist agent. Composes the owner brief (context, AI recommendation, policy clause, confidence score). Sends WhatsApp notification. Waits for resolution via webhook. Times out after 4 hours and re-escalates. Writes resolution to audit log.

Autonomy boundary

Compose and deliver briefs autonomously. Never resolve an escalation without explicit owner action. 4-hour timeout triggers re-escalation, not auto-resolution.

compose_hitl_brief send_owner_alert register_resolution_webhook write_hitl_resolution re_escalate

02 — Leave Agent state machine

Every state. Every
transition. Every guard.

The Leave Agent is a LangGraph state machine. Every state has defined entry actions, every transition has an explicit guard condition, and every terminal state writes an immutable audit record before any outbound message is sent. Nothing is implicit.

Idle

Initial state

Entry: await Pub/Sub event from Intent Router
Data: structured_intent { employee_id, type, dates[], language }

Fetching

Processing

Entry: get_leave_balance(employee_id) · check_team_coverage(date)
Guard: if Firestore read fails → emit ERROR event → HITL path

RAG lookup

Processing

Entry: query_policy_rag(intent, context)
Returns: { clause_text, clause_id, page, confidence }
Guard: if confidence < 0.80 → hitl_trigger = CONFIDENCE

Evaluating

Decision gate

Checks (all must pass for autonomous approval):
1. balance_sufficient = true
2. coverage_maintained = true
3. rag_confidence ≥ 0.80 = true
4. no_legal_threshold_breach = true
Any false → route to HITL_Pending

HITL pending

Awaiting human

Entry: trigger_hitl(brief, recommendation)
Timeout: 4 hours → re-escalate with URGENT flag
Timeout 24 hours → auto-deny with audit note
Resolution: owner APPROVE / DENY → write_audit_log → notify_worker

Approved

Terminal · success

Entry: write_leave_record(APPROVED) · write_audit_log · send_confirmation(language)
Audit fields: decision_by, rag_confidence, clause_id, timestamp

Denied

Terminal · denial

Entry: write_leave_record(DENIED) · write_audit_log · send_denial(language, reason, alternatives)
Reason must cite policy clause. Alternatives offered where available.

State transitions

IDLE →

FETCHING

on: leave_intent_received

FETCHING →

RAG_LOOKUP

guard: firestore_read_success = true

FETCHING →

HITL_PENDING

guard: firestore_read_success = false

RAG_LOOKUP →

EVALUATING

guard: rag_confidence ≥ 0.80

RAG_LOOKUP →

HITL_PENDING

guard: rag_confidence < 0.80

EVALUATING →

APPROVED

guard: all 4 conditions true

EVALUATING →

DENIED

guard: balance_sufficient = false OR coverage_maintained = false

EVALUATING →

HITL_PENDING

guard: legal_threshold_breach = true

HITL_PENDING →

APPROVED / DENIED

on: owner_resolution_received

HITL_PENDING →

DENIED

on: timeout_24h (policy: no response = deny)

03 — Sequence diagram

Happy path.
Eight actors. 11 seconds.

The full message sequence for an autonomous leave approval — from WhatsApp audio arriving at the Meta API to the confirmation message delivered to the worker. Every arrow is a real system call with a documented interface.

Worker

WA API

Pub/Sub

Whisper

Intent Router

Leave Agent

Firestore

Policy RAG

Channel ingestion · 0–2s

voice_note_received(audio, sender_number)

publish(INBOUND_MSG, {audio_url, employee_id})

transcribe_audio(audio_url)

→ {transcript, language, confidence: 0.96} · ~3.8s

Orchestration · 2–4s

route_intent(transcript, language, employee_id)

invoke_leave_agent({type:CASUAL, dates:[tomorrow], employee_id})

Leave Agent execution · 4–9s

get_leave_balance(EMP042) · check_team_coverage(tomorrow)

→ {casual_balance:6, team_on_leave:1} · ~0.6s

query_policy_rag("casual leave approval conditions")

→ {clause:"§4.2", confidence:0.94} · ~1.3s

Leave Agent evaluates: balance_sufficient=✓ coverage_maintained=✓ rag_confidence=0.94≥0.80 → DECISION: APPROVE

Audit + notification · 9–11s

write_leave_record(APPROVED) · write_audit_log(clause_id, confidence)

send_whatsapp_message(employee_id, confirmation_text, language)

"✓ Leave approved. 5 days remaining." · Total: ~11s

04 — Leave Agent tool manifest

Every tool.
Every contract.

Each tool in the Leave Agent's manifest has a defined input schema, output schema, and failure behaviour. The agent cannot call an undocumented tool. Every tool call is logged to the audit trail.

get_leave_balance

Firestore read

IN: employee_id
OUT: balance_object

Returns casual, sick, and earned leave balances for the employee. On Firestore read failure: emit error event, transition to HITL_PENDING. Never return a cached or assumed balance.

check_team_coverage

Firestore read

IN: dept_id, date
OUT: coverage_status

Returns count of approved leave for the department on the requested date. Compares against policy minimum coverage. Returns: { on_leave: N, min_required: M, coverage_maintained: bool }.

query_policy_rag

pgvector semantic search

IN: query_text
OUT: clause_result

Performs semantic similarity search against the HR Policy PDF vector store. Returns: { clause_text, clause_id, page, confidence }. If confidence < 0.80: agent must not auto-decide — HITL trigger is mandatory.

write_leave_record

Firestore write

IN: leave_request
OUT: record_id

Writes approved or denied leave to Firestore /leave_requests. Always called before send_notification — never send a confirmation without a committed record. On write failure: halt and emit error to HITL queue.

write_audit_log

Firestore append-only

IN: audit_entry
OUT: log_id

Appends immutable record to /audit_log. Fields: timestamp, decision, decision_by (AI|HUMAN), policy_clause_ids[], rag_confidence, employee_id. Firestore security rules deny update/delete on this collection — no service account can overwrite.

trigger_hitl

Pub/Sub publish

IN: hitl_brief
OUT: queue_id

Publishes escalation event to HITL Manager. Brief includes: request context, employee record, AI recommendation, policy clause, confidence score, trigger reason. HITL Manager owns the subsequent flow — Leave Agent suspends.

send_notification

WhatsApp API

IN: message_payload
OUT: delivery_status

Sends WhatsApp message to employee in their detected language. Message text is generated by Gemini Flash with the decision, reason, remaining balance, and policy clause reference. Always the last tool called — never before write_audit_log.

05 — HITL specification

Human oversight is
a designed state — not a fallback.

Every HITL checkpoint is a formal state machine node with defined entry conditions, a presentation contract, a decision interface, a timeout behaviour, and an immutable audit record. It is not a bug report. It is the system working as designed.

Entry conditions — when HITL fires

Four triggers. Any one fires the escalation.

The HITL path is not a catch-all for errors. It has four precisely defined trigger conditions. Each trigger produces a different brief to the owner, with different contextual information and a different AI recommendation.

Policy RAG confidence < 0.80 — clause is ambiguous

All leave balances = 0 and unpaid leave policy is ambiguous

Statutory minimum coverage would be breached (Shops Act / Factories Act)

Firestore read or write failure on any critical path call

Presentation contract — what the owner sees

One WhatsApp message. All context. One decision.

The owner receives a single structured WhatsApp message containing: employee name and department, the exact request, the balance state, the policy clause retrieved, the confidence score, the AI recommendation, and the trigger reason. Then two buttons: Approve / Deny. No portal login. No context switching.

Timeout behaviour — what happens if the owner doesn't respond

4 hours: re-escalate. 24 hours: auto-deny.

No HITL escalation is left open indefinitely. At 4 hours without a response: re-escalation with an URGENT flag and a reminder of the pending request. At 24 hours: the request is automatically denied with an audit note stating the reason. The worker is notified. The policy for timeout behaviour is itself configurable in the HR Policy PDF.

Audit contract — what gets written regardless of outcome

Every HITL event is fully immutable.

The audit log records: escalation trigger reason, the full AI brief as presented to the owner, the owner's decision and timestamp, and the final outcome. In a labour dispute, this record demonstrates that the owner was given full context and made an informed decision. It is the system's compliance artefact — not a side effect.

06 — Agent guardrails

What the agents
will never do.

An agent's autonomy boundary is defined not just by what it can do, but by what it is architecturally prevented from doing. These guardrails are enforced at the infrastructure layer — not by agent instructions alone.

Guardrail 01

No agent can overwrite the audit log

Firestore security rules grant append-only access to /audit_log for all service accounts. No update or delete permission is granted to any identity in the system. An agent that attempts to modify a past record will receive a permission denied error. The log is tamper-proof by infrastructure design, not by trust.

Guardrail 02

No notification before an audit record is written

The Leave Agent's state machine enforces a strict ordering: write_audit_log must succeed before send_notification is called. A confirmation message cannot be sent without a committed, timestamped record of the decision. If the audit write fails, the notification is not sent and the error is escalated to HITL.

Guardrail 03

No autonomous decision on confidence < 0.80

The policy RAG confidence threshold is a hard architectural constraint — not a soft preference in the agent prompt. If the RAG returns confidence < 0.80, the HITL transition fires unconditionally. There is no prompt instruction that can override this. The threshold is evaluated in the state machine guard, not in the LLM.

Guardrail 04

No agent crosses its domain boundary

The Leave Agent cannot access grievance records. The Policy Q&A Agent cannot write leave decisions. IAM service account permissions are scoped per agent — each service account has read/write access only to the Firestore collections relevant to its domain. Cross-domain access is a permission denied error, not a policy violation.

Guardrail 05

No credentials in agent code or environment variables

All secrets (WhatsApp API token, Supabase connection string, Exotel credentials) are stored in GCP Secret Manager. No credential appears in source code, Dockerfiles, or environment variable configuration. Each Cloud Run service retrieves secrets at runtime via its IAM service account. Secret rotation requires no code change.

Guardrail 06

Grievance agent: mandatory HITL on wage and discrimination claims

Certain grievance categories carry legal implications that no autonomous system should resolve. Wage disputes, discrimination claims, and safety violations trigger mandatory HITL regardless of policy RAG confidence. The state machine transition to HITL_PENDING fires unconditionally on these categories — confidence score is irrelevant.

The agents thatrun the system.

The agents that
run the system.