Every service.
Every configuration decision.
Twelve services. Each deployed with a specific configuration chosen for cost, latency, or compliance reasons. None of these are defaults — every parameter is a deliberate decision.
Ingress · Channel normalisation
Receives inbound WhatsApp webhooks and IVR audio streams. Validates Meta signature header. Publishes normalised event to Pub/Sub.
min-instances: 1 (always warm — inbound messages cannot wait)
max-instances: 10
cpu: 1 vCPU · memory: 512Mi
ingress: all (public HTTPS endpoint)
auth: --no-allow-unauthenticated + Meta signature verification
Speech-to-text · Whisper large-v3
Transcribes audio. Detects language. Returns confidence score. GPU T4 spot — 83% cheaper than on-demand. Fallback to Google STT on spot unavailability.
min-instances: 1 warm (eliminates cold-start on calls)
gpu: nvidia-tesla-t4 · spot
cpu: 4 vCPU · memory: 16Gi
timeout: 30s per request
ingress: internal-only (Pub/Sub push subscription)
Orchestrator · Intent classification
Stateless. Classifies intent, extracts entities, routes to specialist agent via Pub/Sub. Fast path — must complete in under 1 second.
min-instances: 0 (scale-to-zero · warm in 800ms)
max-instances: 20
cpu: 1 vCPU · memory: 512Mi
concurrency: 80 requests per instance
ingress: internal-only
Specialist agent · Leave lifecycle
Stateful LangGraph agent. Session state persisted in Firestore. Handles leave application, balance query, approval, denial, HITL escalation. Longest-running agent — average 4s per interaction.
min-instances: 0 (scale-to-zero)
max-instances: 10
cpu: 2 vCPU · memory: 1Gi
timeout: 60s (covers full state machine execution)
ingress: internal-only
Primary database · All entities
Stores employee records, leave ledger, HITL queue, sessions, and audit log. Append-only security rules on /audit_log — no service account holds update/delete permissions on this collection.
mode: Native (not Datastore mode)
location: asia-south1 (single-region · DPDP compliant)
backup: daily managed backup · 7-day retention
security rules: audit_log collection append-only
Event bus · Service decoupling
Five topics: INBOUND_MSG, STT_RESULT, AGENT_EVENT, HITL_ESCALATION, NOTIFICATION_OUT. All subscriptions use push delivery to Cloud Run endpoints. Dead-letter topics on all critical subscriptions.
delivery: push (not pull · lower latency)
ack deadline: 60s (covers worst-case agent execution)
dead-letter: enabled · max delivery attempts: 5
retention: 7 days (for replay on failures)
Credentials · Zero secrets in code
Stores all credentials: WhatsApp API token, Exotel API key, Supabase connection string, Gemini API key. Each Cloud Run service accesses only the secrets it needs via its service account IAM binding.
rotation: automated 90-day rotation on WA token
access: per-service-account binding · principle of least privilege
audit: all accesses logged to Cloud Audit Logs
versions: previous version retained 30 days on rotation
LLM inference · Intent + reasoning
Accessed via Vertex AI SDK from Leave Agent and Intent Router. Model version pinned — not using 'latest' alias. Region: asia-south1 for data residency. Quota: 60 QPM on Gemini Flash adequate for SMB scale.
model: gemini-1.5-flash-002 (pinned)
region: asia-south1 (data residency)
quota: 60 QPM · 1M TPM (adequate for ≤ 500 emp)
safety: harm_block_threshold: BLOCK_NONE (HR context)
Object storage · PDFs · artifacts
Three buckets: policy-documents (HR PDFs, versioned), audio-archive (voice recordings, 90-day TTL), model-artifacts (Whisper container layers). All buckets: asia-south1, uniform bucket-level access.
policy-documents: versioning enabled · lifecycle: archive after 1yr
audio-archive: TTL 90 days (DPDP minimum retention)
model-artifacts: versioning enabled · no TTL
access: uniform bucket-level (no per-object ACLs)
Container registry · Image versioning
All Cloud Run container images stored here. Two repositories: autohr-services (application containers) and autohr-models (Whisper fine-tuned images with model weights baked in). Vulnerability scanning enabled on push.
format: Docker
location: asia-south1
scanning: automated vulnerability scan on push
retention: keep last 10 versions per service · auto-clean older
Vector store · Policy RAG
Postgres + pgvector extension. Hosts HR Policy PDF embeddings. AWS ap-south-1 (Mumbai) — same Indian jurisdiction as GCP asia-south1. Connection string stored in Secret Manager. HNSW index for sub-10ms retrieval.
tier: Free (500MB · adequate for SMB policy docs)
region: aws ap-south-1 (IN jurisdiction · DPDP compliant)
index: HNSW · ef_construction: 128 · m: 16
migration trigger: > 400MB used → Supabase Pro ($25/mo)
CI/CD · Terraform pipeline
Cloud Build runs on every push to main: lint, test, container build, push to Artifact Registry. Cloud Deploy manages progressive delivery to staging then production. Terraform plan/apply runs in Cloud Build with remote state in GCS.
triggers: push to main · PR to main (plan only)
tf state: GCS bucket · versioned · state locking
approval: manual gate between staging → production
rollback: Cloud Run revision rollback · < 30s