Infra

GCP reference architecture.
Infrastructure as code.

Every resource in this system has a corresponding Terraform definition. Nothing is created via the console. Nothing is manual. The infrastructure is reproducible, version-controlled, and deployable from a cold GCP project in under 15 minutes — because a portfolio architecture that cannot be built is not a production-grade architecture.

01 — GCP reference architecture
All services.
One region. asia-south1.
All compute, storage, and managed services are deployed in GCP asia-south1 (Mumbai) for data residency compliance with India's DPDP Act 2023 and for latency to the Indian workforce. External services (Meta WhatsApp API, Supabase, Exotel) communicate over HTTPS — no VPN peering required.
External channels (outside GCP)
Meta WhatsAppCloud API · HTTPS webhooks
Exotel / PlivoIndia SIP · IVR gateway
Supabasepgvector · AWS ap-south-1
GCP Project: autohr-prod · asia-south1 (Mumbai)
Cloud Run · webhook-gatewayInbound WA + IVR webhooks · HTTPS · IAM-authenticated
Cloud Pub/SubINBOUND_MSG topic · event fabric
Cloud Run · stt-serviceWhisper large-v3 · T4 GPU spot · min-instances: 1
+
Cloud Run · nllb-serviceNLLB-200 3.3B · CPU · bundled with STT
Pub/Sub · STT_RESULTtranscript + language + confidence
Cloud Run · intent-routerGemini Flash · stateless · scale-to-zero
Cloud Run · leave-agentLangGraph · stateful · Firestore sessions
Cloud Run · hitl-managerHITL orchestration · webhook listener
Firestoreemployees · leave_requests · audit_log · sessions
+
Cloud Storagepolicy PDFs · audio backups · model artifacts
+
Artifact Registrycontainer images · Whisper versions
Secret ManagerWA token · Exotel key · Supabase URL
+
Cloud Monitoringalerting · dashboards · uptime checks
+
Cloud BuildCI/CD · container builds · tf plan/apply
+
Vertex AIGemini Flash · text-embedding-004
02 — Service catalogue
Every service.
Every configuration decision.
Twelve services. Each deployed with a specific configuration chosen for cost, latency, or compliance reasons. None of these are defaults — every parameter is a deliberate decision.
Cloud Run
webhook-gateway
GCP managed
Ingress · Channel normalisation
Receives inbound WhatsApp webhooks and IVR audio streams. Validates Meta signature header. Publishes normalised event to Pub/Sub.
min-instances: 1 (always warm — inbound messages cannot wait)
max-instances: 10
cpu: 1 vCPU · memory: 512Mi
ingress: all (public HTTPS endpoint)
auth: --no-allow-unauthenticated + Meta signature verification
Cloud Run
stt-service
OSS on GCP
Speech-to-text · Whisper large-v3
Transcribes audio. Detects language. Returns confidence score. GPU T4 spot — 83% cheaper than on-demand. Fallback to Google STT on spot unavailability.
min-instances: 1 warm (eliminates cold-start on calls)
gpu: nvidia-tesla-t4 · spot
cpu: 4 vCPU · memory: 16Gi
timeout: 30s per request
ingress: internal-only (Pub/Sub push subscription)
Cloud Run
intent-router
Gemini Flash
Orchestrator · Intent classification
Stateless. Classifies intent, extracts entities, routes to specialist agent via Pub/Sub. Fast path — must complete in under 1 second.
min-instances: 0 (scale-to-zero · warm in 800ms)
max-instances: 20
cpu: 1 vCPU · memory: 512Mi
concurrency: 80 requests per instance
ingress: internal-only
Cloud Run
leave-agent
LangGraph
Specialist agent · Leave lifecycle
Stateful LangGraph agent. Session state persisted in Firestore. Handles leave application, balance query, approval, denial, HITL escalation. Longest-running agent — average 4s per interaction.
min-instances: 0 (scale-to-zero)
max-instances: 10
cpu: 2 vCPU · memory: 1Gi
timeout: 60s (covers full state machine execution)
ingress: internal-only
Firestore
(Native mode)
GCP managed
Primary database · All entities
Stores employee records, leave ledger, HITL queue, sessions, and audit log. Append-only security rules on /audit_log — no service account holds update/delete permissions on this collection.
mode: Native (not Datastore mode)
location: asia-south1 (single-region · DPDP compliant)
backup: daily managed backup · 7-day retention
security rules: audit_log collection append-only
Cloud Pub/Sub
GCP managed
Event bus · Service decoupling
Five topics: INBOUND_MSG, STT_RESULT, AGENT_EVENT, HITL_ESCALATION, NOTIFICATION_OUT. All subscriptions use push delivery to Cloud Run endpoints. Dead-letter topics on all critical subscriptions.
delivery: push (not pull · lower latency)
ack deadline: 60s (covers worst-case agent execution)
dead-letter: enabled · max delivery attempts: 5
retention: 7 days (for replay on failures)
Secret Manager
GCP managed
Credentials · Zero secrets in code
Stores all credentials: WhatsApp API token, Exotel API key, Supabase connection string, Gemini API key. Each Cloud Run service accesses only the secrets it needs via its service account IAM binding.
rotation: automated 90-day rotation on WA token
access: per-service-account binding · principle of least privilege
audit: all accesses logged to Cloud Audit Logs
versions: previous version retained 30 days on rotation
Vertex AI
Gemini Flash
GCP managed
LLM inference · Intent + reasoning
Accessed via Vertex AI SDK from Leave Agent and Intent Router. Model version pinned — not using 'latest' alias. Region: asia-south1 for data residency. Quota: 60 QPM on Gemini Flash adequate for SMB scale.
model: gemini-1.5-flash-002 (pinned)
region: asia-south1 (data residency)
quota: 60 QPM · 1M TPM (adequate for ≤ 500 emp)
safety: harm_block_threshold: BLOCK_NONE (HR context)
Cloud Storage
GCP managed
Object storage · PDFs · artifacts
Three buckets: policy-documents (HR PDFs, versioned), audio-archive (voice recordings, 90-day TTL), model-artifacts (Whisper container layers). All buckets: asia-south1, uniform bucket-level access.
policy-documents: versioning enabled · lifecycle: archive after 1yr
audio-archive: TTL 90 days (DPDP minimum retention)
model-artifacts: versioning enabled · no TTL
access: uniform bucket-level (no per-object ACLs)
Artifact Registry
GCP managed
Container registry · Image versioning
All Cloud Run container images stored here. Two repositories: autohr-services (application containers) and autohr-models (Whisper fine-tuned images with model weights baked in). Vulnerability scanning enabled on push.
format: Docker
location: asia-south1
scanning: automated vulnerability scan on push
retention: keep last 10 versions per service · auto-clean older
Supabase
(pgvector)
External managed
Vector store · Policy RAG
Postgres + pgvector extension. Hosts HR Policy PDF embeddings. AWS ap-south-1 (Mumbai) — same Indian jurisdiction as GCP asia-south1. Connection string stored in Secret Manager. HNSW index for sub-10ms retrieval.
tier: Free (500MB · adequate for SMB policy docs)
region: aws ap-south-1 (IN jurisdiction · DPDP compliant)
index: HNSW · ef_construction: 128 · m: 16
migration trigger: > 400MB used → Supabase Pro ($25/mo)
Cloud Build
+ Cloud Deploy
GCP managed
CI/CD · Terraform pipeline
Cloud Build runs on every push to main: lint, test, container build, push to Artifact Registry. Cloud Deploy manages progressive delivery to staging then production. Terraform plan/apply runs in Cloud Build with remote state in GCS.
triggers: push to main · PR to main (plan only)
tf state: GCS bucket · versioned · state locking
approval: manual gate between staging → production
rollback: Cloud Run revision rollback · < 30s
03 — Terraform IaC
Infrastructure as code.
No console. No exceptions.
Every GCP resource in this system is defined in Terraform. The repo contains a complete /terraform directory that can provision the entire infrastructure from a cold GCP project. Select modules shown below — full source in the GitHub repo.
Cloud Run — Leave Agent terraform/modules/cloud_run/leave_agent.tf
resource "google_cloud_run_v2_service" "leave_agent" { name = "leave-agent" location = var.region # asia-south1 ingress = "INGRESS_TRAFFIC_INTERNAL_ONLY" template { service_account = google_service_account.leave_agent_sa.email scaling { min_instance_count = 0 max_instance_count = 10 } containers { image = "${var.region}-docker.pkg.dev/${var.project_id}/autohr-services/leave-agent:${var.image_tag}" resources { limits = { cpu = "2" memory = "1Gi" } } # Secrets from Secret Manager — no env var credentials env { name = "GEMINI_API_KEY" value_source { secret_key_ref { secret = google_secret_manager_secret.gemini_key.secret_id version = "latest" } } } env { name = "GEMINI_MODEL" value = "gemini-1.5-flash-002" # pinned — not latest } env { name = "RAG_CONFIDENCE_THRESHOLD" value = "0.80" } startup_probe { http_get { path = "/health" } initial_delay_seconds = 10 failure_threshold = 3 } } } }
Firestore Security Rules terraform/modules/firestore/security_rules.tf
# Firestore security rules — audit_log is append-only resource "google_firestore_document" "security_rules" { project = var.project_id collection = "_security_rules" document_id = "rules" # Inline rules — enforced at Firestore layer # No service account can bypass these } # The actual Firestore rules (firestore.rules) # deployed via firebase-tools in CI/CD: /* rules_version = '2'; service cloud.firestore { match /databases/{database}/documents { // Audit log: service accounts may CREATE only // UPDATE and DELETE are denied for ALL identities match /audit_log/{entry} { allow create: if request.auth != null; allow read: if request.auth != null; allow update: if false; // immutable allow delete: if false; // immutable } // Leave requests: agents may create and read // Workers may read their own records only match /leave_requests/{id} { allow create: if request.auth != null; allow read: if request.auth.uid == resource.data.employee_id || request.auth.token.role == "agent"; allow update: if request.auth.token.role == "agent"; allow delete: if false; } // Sessions: 24h TTL enforced via scheduled Cloud Function match /sessions/{id} { allow read, write: if request.auth.token.role == "agent"; } } } */
IAM — Service Account bindings terraform/modules/iam/service_accounts.tf
# Each service gets its own SA — minimum permissions resource "google_service_account" "leave_agent_sa" { account_id = "leave-agent-sa" display_name = "Leave Agent Service Account" } # Firestore: read + write (not admin) resource "google_project_iam_member" "leave_agent_firestore" { project = var.project_id role = "roles/datastore.user" member = "serviceAccount:${google_service_account.leave_agent_sa.email}" } # Secret Manager: access to leave-agent secrets only resource "google_secret_manager_secret_iam_member" "leave_agent_gemini" { secret_id = google_secret_manager_secret.gemini_key.id role = "roles/secretmanager.secretAccessor" member = "serviceAccount:${google_service_account.leave_agent_sa.email}" } # Pub/Sub: publish to AGENT_EVENT topic only resource "google_pubsub_topic_iam_member" "leave_agent_pubsub" { topic = google_pubsub_topic.agent_event.name role = "roles/pubsub.publisher" member = "serviceAccount:${google_service_account.leave_agent_sa.email}" } # Vertex AI: invoke Gemini models only resource "google_project_iam_member" "leave_agent_vertex" { project = var.project_id role = "roles/aiplatform.user" member = "serviceAccount:${google_service_account.leave_agent_sa.email}" } # NOTE: STT service account does NOT get Vertex AI role # Pub/Sub service account does NOT get Firestore role # No cross-domain access — enforced at IAM layer
Cloud Run — STT Service (GPU) terraform/modules/cloud_run/stt_service.tf
resource "google_cloud_run_v2_service" "stt_service" { name = "stt-service" location = var.region ingress = "INGRESS_TRAFFIC_INTERNAL_ONLY" template { service_account = google_service_account.stt_sa.email scaling { min_instance_count = 1 # warm — eliminates GPU cold start max_instance_count = 5 } node_selector { accelerator = "nvidia-tesla-t4" } containers { image = "${var.region}-docker.pkg.dev/${var.project_id}/autohr-models/whisper-large-v3:${var.whisper_tag}" resources { limits = { cpu = "4" memory = "16Gi" "nvidia.com/gpu" = "1" } startup_cpu_boost = true } env { name = "WHISPER_MODEL" value = "large-v3" } env { name = "FALLBACK_TO_GOOGLE_STT" value = "true" # automatic fallback on error } env { name = "MIN_CONFIDENCE_THRESHOLD" value = "0.85" } } } }
Pub/Sub topics + subscriptions terraform/modules/pubsub/topics.tf
# Five topics — one per pipeline stage locals { topics = [ "inbound-msg", "stt-result", "agent-event", "hitl-escalation", "notification-out", ] } resource "google_pubsub_topic" "topics" { for_each = toset(local.topics) name = "autohr-${each.key}" # Retain messages 7 days for replay on failure message_retention_duration = "604800s" } # Dead-letter topic for all critical subscriptions resource "google_pubsub_topic" "dead_letter" { name = "autohr-dead-letter" } # Push subscription: inbound-msg → stt-service resource "google_pubsub_subscription" "inbound_to_stt" { name = "inbound-to-stt" topic = google_pubsub_topic.topics["inbound-msg"].name push_config { push_endpoint = "${google_cloud_run_v2_service.stt_service.uri}/transcribe" oidc_token { service_account_email = google_service_account.pubsub_invoker_sa.email } } ack_deadline_seconds = 60 dead_letter_policy { dead_letter_topic = google_pubsub_topic.dead_letter.id max_delivery_attempts = 5 } }
Cloud Monitoring — Alerting policies terraform/modules/monitoring/alerts.tf
# Alert: STT P95 latency > 6s resource "google_monitoring_alert_policy" "stt_latency" { display_name = "STT P95 latency exceeded" combiner = "OR" conditions { display_name = "STT P95 > 6000ms" condition_threshold { filter = "resource.type=\"cloud_run_revision\" AND metric.type=\"run.googleapis.com/request_latencies\"" comparison = "COMPARISON_GT" threshold_value = 6000 duration = "300s" aggregations { alignment_period = "300s" per_series_aligner = "ALIGN_PERCENTILE_95" } } } notification_channels = [google_monitoring_notification_channel.oncall.name] } # Alert: monthly spend projection > ₹1500 resource "google_billing_budget" "monthly_budget" { billing_account = var.billing_account_id display_name = "AutoHR monthly budget" budget_filter { projects = ["projects/${var.project_id}"] } amount { specified_amount { currency_code = "INR" units = "2000" # hard ceiling } } threshold_rules { threshold_percent = 0.75 } # alert at ₹1500 threshold_rules { threshold_percent = 1.0 } # alert at ₹2000 }
04 — Security model
Threat model for
an SMB HR system.
The threat model is not a regulated enterprise — it is a 52-person textile business. The security priorities reflect this: protect employee PII, prevent unauthorised leave manipulation, ensure the audit log cannot be tampered with. Zero-trust networking and hardware security modules are not in scope at this stage.
Control 01 · Identity
Mobile number as employee identity
WhatsApp sender number verified by Meta API on every inbound message. IVR caller ID verified by Exotel. No password, no token, no app login required. The phone number IS the credential — already verified by the telco and by Meta's onboarding process.
Threat mitigated: employee impersonation via web portal
Not mitigated: SIM swap — accepted risk at SMB threat level
Control 02 · Authorisation
Per-service IAM — minimum permissions
Each Cloud Run service has a dedicated Service Account with permissions scoped to exactly the resources it needs. The STT service cannot read Firestore. The Leave Agent cannot access grievance records. Lateral movement is prevented at the IAM layer — a compromised container can only access its own domain.
Enforced in Terraform service_accounts.tf
Verified: gcloud iam service-accounts list --format=json
Control 03 · Secrets
Zero credentials in code
All secrets (WhatsApp API token, Exotel key, Supabase connection string, Gemini API key) stored in Secret Manager. No credential appears in source code, Dockerfiles, environment variable defaults, or CI/CD configuration. Secrets are injected at Cloud Run startup via IAM-authenticated Secret Manager API calls.
Enforced: pre-commit hook scans for credential patterns
Rotation: automated 90-day rotation on WA token via Secret Manager
Control 04 · Data integrity
Append-only audit log
Firestore security rules deny update and delete on /audit_log for all identities — including the owner service account. No past record can be modified. In a labour dispute, the complete, unmodified history of every HR decision is available. This is the system's legal defence — and it is enforced at the database layer, not at the application layer.
rules_version = '2'
allow update: if false; // immutable
allow delete: if false; // immutable
Control 05 · Transport
HTTPS everywhere — Google-managed TLS
All Cloud Run endpoints are HTTPS with Google-managed TLS certificates. Pub/Sub uses TLS transport. Supabase connection uses TLS with certificate pinning in the connection string. No plaintext communication on any path.
Cloud Run: TLS termination at Google Front End
Pub/Sub: TLS 1.3 enforced on all subscriptions
Control 06 · Compliance
DPDP Act 2023 — data residency
All GCP resources in asia-south1 (Mumbai). Supabase on AWS ap-south-1 (Mumbai). Vertex AI inference pinned to asia-south1. No personal data leaves Indian jurisdiction. Audio recordings retained maximum 90 days per DPDP minimum period. Employee records retained per labour law requirements (employment duration + 3 years).
Terraform: provider region = "asia-south1"
Org policy: constraints/gcp.resourceLocations enforced
05 — CI/CD pipeline
From commit to production.
Automated. Gated. Reversible.
Every change to the system — application code, Terraform configuration, or Whisper model weights — goes through the same pipeline. No console deployments. No manual Cloud Run deploys. The pipeline is the only path to production.
01
Code push to feature branch
Developer pushes to feature/ branch · pre-commit hooks run: credential scan, linting, unit tests · Cloud Build trigger: PR validation pipeline
Cloud Build · pre-commit
02
Pull request — Terraform plan
terraform plan runs against staging state · diff posted as PR comment · no apply until merge · manual code review required
Terraform Cloud Build trigger
03
Merge to main — container build
Cloud Build builds container image · docker build with layer caching · vulnerability scan via Artifact Registry · image pushed with git_sha tag
Cloud Build · Artifact Registry
04
Staging deployment — automated
terraform apply to staging workspace · Cloud Run revision deployed · integration tests run against staging endpoints · Pub/Sub end-to-end smoke test
Cloud Deploy · staging
05
Manual promotion gate
Cloud Deploy promotion requires manual approval · staging test results reviewed · Terraform plan against production state reviewed · one-click promote or reject
Cloud Deploy · approval gate
06
Production deployment — traffic split
New Cloud Run revision deployed with 10% traffic · error rate monitored for 15 minutes · automated full promotion or rollback based on error threshold
Cloud Run · traffic splitting
07
Rollback — 30 seconds
Previous Cloud Run revision retained for 30 days · instant rollback via gcloud run services update-traffic · Terraform state rolled back in separate operation
Cloud Run revisions
06 — IAM service account matrix
Who can do what.
Nothing more. Nothing less.
Every service account permission in this system is intentional, documented, and enforced in Terraform. The absence of a permission is as deliberate as its presence.
Service account Firestore Pub/Sub Secret Manager Vertex AI Cloud Storage Artifact Registry
webhook-gateway-sa publish: inbound-msg WA token (read)
stt-service-sa publish: stt-result audio-archive (write)
intent-router-sa read: employees publish: agent-event Gemini key (read) aiplatform.user
leave-agent-sa read/write: leave_requests, employees, sessions · create: audit_log · read: audit_log publish: hitl-escalation, notification-out Gemini key · Supabase URL (read) aiplatform.user
hitl-manager-sa read/write: hitl_queue · create: audit_log publish: notification-out WA token (read)
notification-sa subscribe: notification-out WA token (read)
rag-indexer-sa read/write: policy_versions Supabase URL · Gemini key (read) aiplatform.user policy-documents (read)
cicd-cloudbuild-sa tf-state bucket (read/write) reader + writer